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CHAPTER 1 
GUIDE TO THIS MANUAL 


This manual describes the embedded Intel486™ processors. It is intended for use by hardware 
designers familiar with the principles of embedded microprocessors and with the Intel486 pro- 


cessor architecture. 


1.1. MANUAL CONTENTS 


This manual contains 12 chapters, seven appendices, a glossary, and an index. This section sum- 
marizes the contents of the remaining chapters and appendixes. The remainder of this chapter de- 
scribes notation conventions and special terminology used throughout the manual and provides 
references to related documentation. 


Chapter 2: 
“Embedded 
Intel486™ Processor 
Overview” 


Chapter 3: 
“Architectural 
Overview” 


Chapter 4: 
“System Register 
Organization” 


Chapter 5: 
“Real Mode 
Architecture” 


Chapter 6: 
“Protected Mode 
Architecture” 


Chapter 7: 
“On-Chip Cache” 


Chapter 8: 

“System Management 
Mode (SMM) 
Architectures” 


This chapter provides an overview of the current embedded Intel486 
processor family, including product features, system components, sys- 
tem architecture, and applications. This chapter also lists product fre- 
quency, voltage, and package offerings. 


This chapter describes the Intel486 processor internal architecture, with 
an overview of the processor’s functional units. 


This chapter details the Intel486 processor register set, including the 
base architecture registers, system-level registers, and debug and test 
registers. 


When the processor is powered-up, it is initialized in Real Mode, which 
is an architecture similar to the 8086 processor. This chapter describes 
Real Mode addressing and interrupt handling. 


This chapter describes Protected Mode, which is an architecture that al- 
lows the Intel486 processor to access up to four Gbytes of address space. 


All members of the Intel486 processor family contain an on-chip cache, 
also known as L1 cache. This chapter describes its functionality. 


This chapter describes the System Management Mode architecture of 
the Intel486 processors, including System Management Mode interrupt 
processing and programming. 
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Chapter 9: 
“Hardware Interface” 


Chapter 10: 
“Bus Operation” 


Chapter 11: 
“Debugging Support” 


Chapter 12: 
“Instruction Set 
Summary” 


Appendix A: 
“Signal Descriptions” 


Appendix B: 
“Testability” 


Appendix C: 
“Advanced Features” 


Appendix D: 
“Feature 
Determination” 


Appendix E: 
“1/O Buffer Models” 


Appendix F: 
“BSDL Listings” 


Appendix G: 
“System Design 
Notes” 
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This chapter describes the hardware interface of the current Intel486 
processor family, including signal descriptions, interrupt interfaces, 
write buffers, reset and initialization, and clock control. Use and opera- 
tion of the Stop Clock, Auto HALT Power Down and other power-sav- 
ing SL Technology features are described. 


The IntelDX4 processor speed multiplying options and the Write-Back 
Enhanced processor signals are also detailed in this chapter. 


This chapter describes the features of the processor bus, including bus 
cycle handling, interrupt and reset signals, cache control, and floating- 
point error control. 


This chapter describes the Intel486 processor debugging support, in- 
cluding the breakpoint instruction, single-step trap, and debug registers. 


This chapter describes the Intel486 processor instruction set and the en- 
coding of each field within the instructions. 


This appendix lists each Intel486 processor signal and describes its func- 
tion. 


This appendix describes the testability of the Intel486 processors, in- 
cluding the built-in self test (BIST), on-chip cache testing, translation 
lookaside buffer (TLB) testing, three-state output test mode, and bound- 
ary scan (JTAG). Processor-specific differences regarding the Write- 
Back Enhanced Intel486 processors are documented in this chapter. A 
complete listing of Boundary Scan ID Codes and Boundary Scan Regis- 
ter Bits orders is also included. 


This appendix documents the advanced features of the Intel486 proces- 
sor family not covered in other chapters. 


This appendix documents the CPUID function, which is used to deter- 
mine the Intel486 processor family identification and processor-specific 
information. 


This appendix provides a detailed sample listing of the types of I/O buff- 
er modeling information available for the Intel486 processor family. 


This appendix provides a sample listing of a BSDL file for the Intel486 
processor family. 


This appendix provides design notes applicable to the use of System 
Management Mode and SMM routines with the Intel486 processor. 
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1.2 NOTATION CONVENTIONS 


The following notations are used throughout this manual. 


# The pound symbol (#) appended to a signal name indicates that the signal 
is active low. 

Variables Variables are shown in italics. Variables must be replaced with correct 
values. 

New Terms New terms are shown in italics. 

Instructions Instruction mnemonics are shown in upper case. When you are 


programming, instructions are not case-sensitive. You may use either 
upper or lower case. 


Numbers Hexadecimal numbers are represented by a string of hexadecimal digits 
followed by the character H. A zero prefix is added to numbers that begin 
with A through F. (For example, FF is shown as OFFH.) Decimal and 
binary numbers are represented by their customary notations. (That is, 
255 is a decimal number and 1111 1111 is a binary number. In some 
cases, the letter B is added for clarity.) 


Units of Measure —_ The following abbreviations are used to represent units of measure: 


A amps, amperes 

mA milliamps, milliamperes 
pA microamps, microamperes 
Mbyte megabytes 

Kbyte kilobytes 

Gbyte gigabyte 

W watts 

KW kilowatts 

mW milliwatts 

yW microwatts 

MHz megahertz 

ms milliseconds 

ns nanoseconds 

Ls microseconds 

uF microfarads 

pF picofarads 

Vv volts 
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Register Bits When the text refers to more that one bit, the range of bits is represented 
by the highest and lowest numbered bits, separated by a colon (example: 
A[15:8]). The first bit shown (15 in the example) is the most-significant 
bit and the second bit shown (8) is the least-significant bit. 


Register Names Register names are shown in upper case. If a register name contains a 
lower case, italic character, it represents more than one register. For 
example, PnCFG represents three registers: P1ICFG, P2CFG, and P3CFG. 


Signal Names Signal names are shown in upper case. When several signals share a 
common name, an individual signal is represented by the signal name 
followed by a number, whereas the group is represented by the signal 
name followed by a variable (n). For example, the lower chip select 
signals are named CSO#, CS1#, CS2#, and so on; they are collectively 
called CSn#. A pound symbol (#) appended to a signal name identifies an 
active-low signal. Port pins are represented by the port abbreviation, a 
period, and the pin number (e.g., P1.0, P1.1). 


1.3 SPECIAL TERMINOLOGY 


The following terms have special meanings in this manual. 


Assert and De-assert The terms assert and de-assert refer to the act of making a signal 
active and inactive, respectively. The active polarity (high/low) is 
defined by the signal name. Active-low signals are designated by the 
pound symbol (#) suffix; active-high signals have no suffix. To 
assert RD# is to drive it low; to assert HOLD is to drive it high; to 
de-assert RD# is to drive it high; to de-assert HOLD is to drive it 
low. 


DOS I/O Address Peripherals that are compatible with PC/AT system architecture can 
be mapped into DOS (or PC/AT) addresses 0H—O3FFH. In this 
manual, the terms DOS address and PC/AT address are synonymous. 


Expanded I/O Address __ All peripheral registers reside at I/O addresses OFOOOH-OFFFFH. 
PC/AT-compatible integrated peripherals can also be mapped into 
DOS (or PC/AT) address space (OH-03FFH). 


PC/AT Address Integrated peripherals that are compatible with PC/AT system 
architecture can be mapped into PC/AT (or DOS) addresses OH— 
O3FFH. In this manual, the terms DOS address and PC/AT address 
are synonymous. 


Set and Clear The terms set and clear refer to the value of a bit or the act of giving 
it a value. If a bit is set, its value is “1”; setting a bit gives it a “1” 
value. If a bit is clear, its value is “0”; clearing a bit gives it a “O” 
value. 
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1.4 ELECTRONIC SUPPORT SYSTEMS 


Intel’s FaxBack* service provides up-to-date technical information. Intel also offers a variety of 
information on the World Wide Web. These systems are available 24 hours a day, 7 days a week, 
providing technical information whenever you need it. 


1.4.1. FaxBack Service 


FaxBack is an on-demand publishing system that sends documents to your fax machine. You can 
get product announcements, change notifications, product literature, device characteristics, de- 
sign recommendations, and quality and reliability information from FaxBack 24 hours a day, 7 
days a week. 


1-800-525-3019 (US or Canada) 
+44-1793-496646 (Europe) 
+65-256-5350 (Singapore) 
+852-2-844-4448 (Hong Kong) 
+886-2-5 14-0815 (Taiwan) 
+822-767-2594 (Korea) 

+6 1-2-975-3922 (Australia) 
1-503-264-6835 (Worldwide) 


Think of the FaxBack service as a library of technical documents that you can access with your 
phone. Just dial the telephone number and respond to the system prompts. After you select a doc- 
ument, the system sends a copy to your fax machine. 


1.4.2 World Wide Web 


Intel offers a variety of information through the World Wide Web (http://www. intel.com/). 


1.5 TECHNICAL SUPPORT 


In the U.S. and Canada, technical support representatives are available to answer your questions 
between 5 a.m. and 5 p.m. PST. You can also fax your questions to us. (Please include your voice 
telephone number and indicate whether you prefer a response by phone or by fax). Outside the 
U.S. and Canada, please contact your local distributor. 


1-800-628-8686 U.S. and Canada 
916-356-7599 U.S. and Canada 
916-356-6100 (fax) U.S. and Canada 
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1.6 PRODUCT LITERATURE 


You can order product literature from the following Intel literature centers. 


1-800-548-4725 U.S. and Canada 
708-296-9333 U.S. (from overseas) 
44(0)1793-431155 Europe (U.K.) 
44(0)1793-421333 Germany 
44(0)1793-421777 France 
81(0)120-47-88-32 Japan (fax only) 


1.6.1 Related Documents 


The following Intel documents contain additional information on designing systems that incor- 


porate the Intel486 processors. 


Intel Document Name Intel Order Number 
Datasheets 

Embedded Intel486™ SX Processor datasheet 272769 

Embedded IntelDX2™ Processor datasheet 272770 

Embedded Ultra-Low Power Intel486™ SX Processor datasheet 272731 

Embedded Ultra Low-Power Intel486™ GX Processor datasheet 272755 

Embedded Write-Back Enhanced IntelDX4™ Processor datasheet 272771 

Manuals 

Embedded Intel486™ Processor Hardware Reference Manual 273025 

Intel Architecture Software Developer's Manual, Volumes 1 and 2 243190 
243191 

Ultra-Low Power Intel486™ SX Processor Evaluation Board Manual 272815 

Intel486™ Processor Family Programmer's Reference Manual 240486 

Application Notes/Performance Briefs 

AP-505--Picking Up the Pace: Designing the IntelDX4™ Processor into 242034 

Intel486™ Processor-Based Designs 

Intel486™ Microprocessor Performance Brief 241254 

IntelDX4™ Processor Performance Brief 242446 

MultiProcessor Specification 242016 


1-6 


intel. GUIDE TO THIS MANUAL 


You can obtain the following resources from the Word Wide Web at the sites listed. 


Document Name Web Site 


Standard 1149.1—1990, IEEE Standard Test Access Port and Boundary- Contact the IEEE at 
Scan Architecture and its supplement, Standard 1149. 1a—1993 http://www.ieee.org. 
PCI Local Bus Specification, Revisions 2.0 and 2.1 Contact the PCI Special 


Interest Group at 
http://www.pcisig.com 
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CHAPTER 2 
EMBEDDED Intel486™ PROCESSOR OVERVIEW 


The Intel486™ processor family enables a range of low-cost, high-performance system designs 
capable of running the entire installed base of DOS*, Windows 95*, Windows NT*, OS/2*, and 
UNIX*-based applications written for the Intel architecture. The processor family also enables a 
wide range of high-performance embedded application designs. This family includes the follow- 
ing processors: 


¢ The IntelDX4™ processor is the fastest Intel486 processor (up to 50% faster than the 
IntelIDX2™ processor). The IntelDX4 processor integrates a 16-Kbyte unified cache and 
floating-point hardware on-chip for improved performance. 


The IntelDX4 processor is also available with a write-back on-chip cache for improved 
performance. 


¢ The IntelDX2 processor integrates an 8 Kbyte unified cache and floating-point hardware 
on-chip. 
The IntelDX4 and IntelDX2 processors use Intel’s speed-multiplying technology, allowing 
the processor core to operate at frequencies higher than the external processor bus. 


¢ The Intel486 SX processor offers the features of the IntelDX2 processor without floating- 
point hardware and clock multiplying. 


¢ The Ultra-Low Power Ultra-Low Power Intel486 SX and Ultra-Low Power Intel486 GX 
processors provide additional power-saving features for use in battery-operated and hand- 
held embedded designs. The Ultra-Low Power Intel486 SX, like the other Intel486 
processors, supports dynamic data bus sizing for 8-, 16-, or 32-bit bus sizes, whereas the 
Ultra-Low Power Intel486 GX has a 16-bit external data bus. 


The entire Intel486 processor family incorporates energy efficient “SL Technology” for mobile 
and fixed embedded computing. SL Technology enables system designs that exceed the Environ- 
ment Protection Agency’s (EPA) Energy Star program guidelines without compromising perfor- 
mance. It also increases system design flexibility and improves battery life in all Intel486 
processor-based, hand-held applications. SL Technology allows system designers to differentiate 
their power management schemes with a variety of energy efficient, battery life enhancing fea- 
tures. 


Intel486 processors provide power management features that are transparent to application and 
operating system software. Stop Clock, Auto HALT Power Down, and Auto Idle Power Down 
allow software-transparent control over processor power management. 


Equally important is the capability of the processor to manage system power consumption. 
Intel486 processor System Management Mode (SMM) incorporates a non-maskable System 
Management Interrupt (SMI#), a corresponding Resume (RSM) instruction and a new memory 
space for system management code. Although transparent to any application or operating system, 
Intel's SMM ensures seamless power control of the processor core, system logic, main memory, 
and one or more peripheral devices. 
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Intel486 processors are available in a full range of speeds (25 MHz to 100 MHz), packages (PGA, 
SQFP, PQFP, TQFP), and voltages (5 V, 3.3 V, 3.0 V and 2.0 V) to meet many system design 
requirements. 


2.1 PROCESSOR FEATURES 


All Intel486 processors consist of a 32-bit integer processing unit, an on-chip cache, and a mem- 
ory management unit. These ensure full binary compatibility with the 8086, 8088, 80186, 80286, 
Intel386™ SX, and Intel386 DX processors, and with all versions of Intel486 processors. All 
Intel486 processors offer the following features: 


¢ 32-bit RISC integer core — The Intel486 processor performs a complete set of arithmetic 
and logical operations on 8-, 16-, and 32-bit data types using a full-width ALU and eight 
general purpose registers. 


¢ Single Cycle Execution — Many instructions execute in a single clock cycle. 


¢ Instruction Pipelining — The fetching, decoding, address translation, and execution of 
instructions are overlapped within the Intel486 processor. 


¢ On-Chip Floating-Point Unit — The IntelDX2 and Intel DX4 processors support the 32-, 
64-, and 80-bit formats specified in IEEE standard 754. 


© On-Chip Cache with Cache Consistency Support — An 8-Kbyte (16-Kbyte on the IntelDX4 
processor) internal cache is used for both data and instructions. Cache hits provide zero wait 
state access times for data within the cache. Bus activity is tracked to detect alterations in 
the memory represented by the internal cache. The internal cache can be invalidated or 
flushed so that an external cache controller can maintain cache consistency. 


¢ External Cache Control — Write-back and flush controls for an external cache are provided 
so the processor can maintain cache consistency. 


¢ On-Chip Memory Management Unit — Address management and memory space protection 
mechanisms maintain the integrity of memory in a multi-tasking and virtual memory 
environment. The memory management unit supports both segmentation and paging. 


¢ Burst Cycles — Burst transfers allow a new doubleword to be read from memory on each 
bus clock cycle. This capability is especially useful for instruction prefetch and for filling 
the internal cache. 


¢ Write Buffers — The processor contains four write buffers to enhance the performance of 
consecutive writes to memory. The processor can continue internal operations after a write 
to these buffers, without waiting for the write to be completed on the external bus. 


¢ Bus Backoff — If another bus master needs control of the bus during a processor-initiated 
bus cycle, the Intel486 processor floats its bus signals, then restarts the cycle when the bus 
becomes available again. 


¢ Instruction Restart — Programs can continue execution following an exception that is 
generated by an unsuccessful attempt to access memory. This feature is important for 
supporting demand-paged virtual memory applications. 
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¢ Dynamic Bus Sizing — External controllers can dynamically alter the effective width of the 
data bus. Bus widths of 8, 16, or 32 bits can be used (the 8-bit and 32-bit bus widths are not 
available on the Ultra-Low Power Intel486 GX processor). 


¢ Boundary Scan (JTAG) — Boundary Scan provides in-circuit testing of components on 
printed circuit boards. The Intel Boundary Scan implementation conforms with the IEEE 
Standard Test Access Port and Boundary Scan Architecture. 


SL Technology provides the following features: 


¢ Intel System Management Mode — A unique Intel architecture operating mode provides a 
dedicated special purpose interrupt and address space that can be used to implement 
intelligent power management and other enhanced functions in a manner that is completely 
transparent to the operating system and applications software. 


¢ I/O Restart — An J/O instruction interrupted by a System Management Interrupt (SMI#) 
can automatically be restarted following the execution of the RSM instruction. 


¢ Stop Clock — The Intel486 processor has a stop clock control mechanism that provides two 
low-power states: a “fast wake-up” Stop Grant state and a “slow wake-up” Stop Clock state 
with CLK frequency at 0 MHz. 


¢ Auto HALT Power Down — After the execution of a HALT instruction, the Intel486 
processor issues a normal Halt bus cycle and the clock input to the Intel486 processor core 
is automatically stopped, causing the processor to enter the Auto HALT Power Down state. 


¢ Upgrade Power Down Mode — When an Intel486 processor upgrade is installed, the 
Upgrade Power Down Mode detects the presence of the upgrade, powers down the core, 
and three-states all outputs of the original processor, so the Intel486 processor enters a very 
low current mode. 


¢ Auto Idle Power Down — This function allows the processor to reduce the core frequency 
to the bus frequency when both the core and bus are idle. Auto Idle Power Down is 
software-transparent and does not affect processor performance. Auto Idle Power Down 
provides an average power savings of 10% and is only applicable to clock-multiplied 
processors. 


Enhanced Bus Mode Features (for the Write-Back Enhanced IntelDX4 processor only): 


© Write Back Internal Cache — The Write-Back Enhanced IntelDX4 processor adds write- 
back support to the unified cache. The on-chip cache is configurable to be write-back or 
write-through on a line-by-line basis. The internal cache implements a modified MESI 
protocol, which is most applicable to single processor systems. 


¢ Enhanced Bus Mode — The definitions of some signals have been changed to support the 
new Enhanced Bus Mode (Write-Back Mode). 


¢ Write Bursting — Data written from the processor to memory can be burst to provide zero 
wait state transfers. 
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The Intel486 processor also has features that facilitate high-performance hardware designs. The 
1X bus clock input eases high-frequency board-level designs. The clock multiplier on IntelDX2 
and IntelDX4 processors improves execution performance without increasing board design com- 
plexity. The clock multiplier enhances all operations operating out of the cache that are not 
blocked by external bus accesses. The burst bus feature enables fast cache fills. 


Intel486™ PROCESSOR PRODUCT OPTIONS 


Table 2-1 shows the Intel486 processors available by clock mode, supply voltage, maximum fre- 
quency, and package. An individual product has either a 5 V supply voltage or a 3.3 V supply 
voltage, but not both. Likewise, an individual product may have 1x, 2x, or 3x clock. Please con- 


2.2 


tact Intel for the latest product availability and specifications. 


Table 2-1. Product Options 


Processor 
168- | 208- | 196- | 176- 
Intel486™ Processor] Vcc! Frequency (Mp2) Pin Lead | Lead | Lead 
16| 201251331 40/50 66 | 75 100 PGA | SQFP | PQFP| TQFP 
1x Clock 
Intel486 SX 3.3V Viv 
Processor 
Ultra-Low Power 2.4-3.3 v v 
Intel486 SX 
Processor 2.7-3.3 v 4 
2.0-3.3| ¥ v 
Voor? v v 
Ultra-Low Power 22-33 
Intel486 GX 
Processor 2.4-3.3 v v 
Voce? v v 
2.7-3.3 
2x Clock 
IntelDX2™ Processor 3.3 v | | | | v | 
3x Clock 
Write-Back Enhanced 3.3 vi v v v 
IntelIDX4™ Processor 


+ Vgc is the main power to the processor core. 


$ Vecp is used in the ULP486 SX/GX processors as a separate power supply to the peripherals. 
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2.2.1. Operating Modes and Compatibility 


The Intel486 processor can run in modes that give it object-code compatibility with software writ- 
ten for the 8086, 80286, and Intel386 processor families. The operating mode is set in software 
as one of the following: 


¢ Real Mode: When the processor is powered up or reset, it is initialized in Real Mode. This 
mode has the same base architecture as the 8086 processor but allows access to the 32-bit 
register set of the Intel486 processor. The address mechanism, maximum memory size 
(1 Mbyte), and interrupt handling are identical to the Real Mode of the 80286 processor. 
Nearly all Intel486 processor instructions are available, but the default operand size is 16 
bits; in order to use the 32-bit registers and addressing modes, override instruction prefixes 
must be used. The primary purpose of Real Mode is to set up the processor for Protected 
Mode operation. 


¢ Protected Mode (also called Protected Virtual Address Mode): The complete capabilities of 
the Intel486 processor become available when programs are run in Protected Mode. In 
addition to segmentation protection, paging can be used in Protected Mode. The linear 
address space is four gigabytes and virtual memory programs of up to 64 terabytes can be 
tun. All existing 8086, 80286, and Intel386 processor software can be run under the 
Intel486 processor’s hardware-assisted protection mechanism. The addressing mechanism 
is more sophisticated in Protected Mode than in Real Mode. 


¢ Virtual 8086 Mode, a sub-mode of Protected Mode, allows 8086 programs to be run with 
the segmentation and paging protection mechanisms of Protected Mode. This mode offers 
more flexibility than the Real Mode for running 8086 programs. Using this mode, the 
Intel486 processor can execute 8086 operating systems and applications simultaneously 
with an Intel486 operating system and both 80286 and Intel486 processor applications. 
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2.3. SYSTEMS APPLICATIONS 

Most Intel486 processor systems can be grouped as one of these types: 
¢ Embedded Personal Computer 
¢ Embedded Controller 


Each type of system has distinct design goals and constraints, as described in the following sec- 
tions. Software running on the processor, even in stand-alone embedded applications, should use 
a standard operating system such as DOS, Windows 95, Windows NT, OS/2, or UNIX System 
V/386*, to facilitate debugging, documentation, and transportability. 


2.3.1. Embedded Personal Computers 


In single-processor embedded systems, the processor interacts directly with I/O devices and 
DRAM memory. Other bus masters such as a LAN coprocessor typically reside on the system 
bus; conventional personal computer architecture puts most peripherals on separate plug-in 
boards. Expansion is typically limited to memory boards and I/O boards. A standard I/O archi- 
tecture such as MCA or EISA is used. Figure 2-1 shows an example of a personal computer ap- 
plication. 


Optional Intel486™ 
Level-2 Cache Processor 


Processor Bus 


Local Bus 


Peripheral Controller 
Controller 


System Bus 


Other 


Peripheral 


Figure 2-1. Embedded Personal Computer and Embedded Controller Example 
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External cache is optional in such environments, particularly if system performance is not a crit- 
ical parameter. Where an external cache is used, memory-access speeds improve only if the cache 
memory access has zero to one wait states. 


2.3.2 Embedded Controllers 


Most embedded controllers perform real-time tasks. The performance of the Intel486 processor 
and its compatibility with the extensive installed base of Intel386 processors are important factors 
in its choice for embedded designs. Embedded controllers are usually implemented as stand- 
alone systems, with less expansion capability than other applications because they are tailored 
specifically to a single environment. 


If code must be stored in EPROM, ROM, or Flash for non-volatility, but performance is also a 
critical issue, then the code should be copied into RAM provided specifically for this purpose. 
Frequently used routines and variables, such as interrupt handlers and interrupt stacks, can be 
placed in the processor’s internal cache so they are always available quickly. 


Embedded controllers usually require less memory than other applications, and control programs 
are usually tightly written machine-level routines that need optimal performance in a limited va- 
riety of tasks. The processor typically interacts directly with I/O devices and DRAM memory. 
Other peripherals connect to the system bus. 
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CHAPTER 3 
ARCHITECTURAL OVERVIEW 


3.1. INTERNAL ARCHITECTURE 


3.1.1 Overview 


The Intel486™ SX processor and the Ultra-Low Power Intel486 SX have a 32-bit architecture 
with on-chip memory management and level-1 cache. 


The IntelDX2™ and IntelDX4™ processors also have a 32-bit architecture with on-chip memory 
management and cache, but add clock multiplier and floating point units. The Intel486 SX, Ultra- 
Low Power Intel486 SX, and Intel486 DX processors support dynamic bus sizing for the external 
data bus; that is, the bus size can be specified as 8-, 16-, or 32-bits wide. 


Internally, the ultra-low power processors are similar to the Intel486 SX, but add a specialized 
clock control unit. Although the Ultra-Low Power Intel486 SX supports dynamic bus sizing, the 
Ultra-Low Power Intel486 GX supports only a 16-bit external data bus. The Ultra-Low Power 
Intel486 GX also has advanced power management features. 


Table 3-1 lists the functional units of the embedded Intel486 processors. 


Table 3-1. Intel486™ Processor Family Functional Units 


Ultra-Low Power 
rageasiaiiny amma ates) ee ae 
Intel486 GX Processors 
Bus Interface v v v 
Cache (L1) v v v 
Instruction Prefetch v v v 
Instruction Decode v v v 
Control v v v 
Integer and Datapath v v v 
Segmentation v v v 
Paging v v v 
Floating-Point v 
Clock Multiplier v 
Clock Control v 


Figure 3-1 is a block diagram of the embedded IntelDX2 and IntelDX4 processors. Note that the 
cache unit is 8 Kbytes for the IntelDX2 processor and 16 Kbytes for the IntelDX4. 
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Figure 3-2 is a block diagram of the embedded Intel486 SX processor. Figures 3-3 and 3-4 are 
block diagrams of the Ultra-Low Power Intel486 SX and GX processors, respectively. 
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Figure 3-1. IntelDX2™ and IntelDX4™ Processors Block Diagram 


3-2 


tal. ARCHITECTURAL OVERVIEW 


64-Bit Interunit Transfer Bus 


32-Bit Data Bus 


32-Bit Data Bus 


Linear Address 


Bus Interface 


C=» Address 
eee Drivers 
Write Buffers 
et ae 
= ms 


A31-A2 


Segmentation 
BE3#- BEO# 


Unit 
Descriptor 
Registers 
Limit and 
Attribute PLA 


Translation 
Lookaside 
Buffer 


Data Bus 
Transceivers 


ADS# W/R# D/C# M/lO# 
PCD PWT RDY# LOCK# 
PLOCK# BOFF# A20M# 
BREQ HOLD HLDA 
RESET SRESET INTR 
NMI SMI# SMIACT# 
FERR# IGNNE# 
STPCLK# 


Request 
Sequencer === 
32-Byte Code 
Queue Burst Bus | BRDY# BLAST# 


Control | > 
2x16 Bytes 
I : 
Sete Instruction Bus Size paeeiata ts 
F Decode Control [=== 
Test Unit 


KEN# FLUSH# 
Decoded AHOLD EADS# 
Instruction 


Path 


Bus Control 


Displacement Bus 


Micro- 
Instruction 


Control 
ROM 


DP3-DP0 PCHK# 
Sere'ton | = 
and Control 

TCK TMS 

TDI TDO 


Boundary 
Scan 
Control 


A5443-01 


Figure 3-2. Intel486™ SX Processor Block Diagram 
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Figure 3-3. Ultra-Low Power Intel486™ SX Processor Block Diagram 


tal. ARCHITECTURAL OVERVIEW 


64-Bit Interunit Transfer Bus 
32-Bit Data Bus CLK Input 
Inpu' 
32-Bit Data Bus Clock 
Control 
Linear Address 


Bus Interface A31-A2 


ap: Address BES#- BEO# 
heal Drivers 


Segmentation 
Unit 


Descriptor 

Registers 

Limit and 
Attribute PLA 


Write Buffers 
4x32 


Data Bus 
Transceivers 


Bus Control 


Translation 
Lookaside 
Buffer 


D15-DO 


| 


ADS# W/R# D/C# M/lIO# 
PCD PWT RDY# LOCK# 
PLOCK# BOFF# A20M# 
BREQ HOLD HLDA 
RESET SRESET INTR 
NMI SMI# SMIACT# 
STPCLK# 


Displacement Bus 


Prefetcher 


Micro- 


Instruction Request 
Sequencer ———»> 


aie Code Burst Bus. | BRDY# BLAST# 
ust Control | TT 


2x16 Bytes 


Control & ' : 
Protection nstruction 
Test Unit Decode 


Bus Size 
Control 


KEN# FLUSH# 
AHOLD EADS# 


=A 


DP1-DP0, PCHK# 


Decoded 
Instruction 
Path 


Control 
ROM 


Generation 


and Control ap 


Boundary | TCK TMS 
Scan TDI TDO 


Control 


A5851-01 


Figure 3-4. Ultra-Low Power Intel486™ GX Processor Block Diagram 
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The functional units of the Intel486 processor are described in the following sections. 


3.1.2. Bus Interface Unit (BIU) 


The bus interface unit (BIU) prioritizes and coordinates data transfers, instruction prefetches, and 
control functions between the processor’s internal units and the outside system. Internally, the 
BIU communicates with the cache and the instruction prefetch units through three 32-bit buses, 
as shown in Figure 3-1. Externally, the BIU provides the processor bus signals, as described in 
Chapter 3. Except for cycle definition signals, all external bus cycles, memory reads, instruction 
prefetches, cache line fills, etc., look like conventional microprocessor cycles to external hard- 
ware, with all cycles having the same bus timing. 


The BIU contains the following architectural features: 


¢ Address Transceivers and Drivers — The A31—A2 address signals are driven on the 
processor bus, together with their corresponding byte-enable signals, BE3#-BE0#. The 
high-order 28 address signals are bidirectional, allowing external logic to drive cache 
invalidation addresses into the processor. 


¢ Data Bus Transceivers — The D31—D0 data signals are driven onto and received from the 
processor bus (for the Ultra-Low Power Intel486 GX processor, signals D15—DO comprise 
the data bus transceivers). 


¢ Bus Size Control — Three sizes of external data bus can be used: 32, 16, and 8 bits wide. 
Two inputs (BS16# and BS8#) from external logic specify the width to be used. Bus size 
can be changed on a cycle-by-cycle basis. The Ultra-Low Power Intel486 GX does not 
support dynamic bus sizing; its external data bus is 16 bits wide. 


¢ Write Buffering — Up to four write requests can be buffered, allowing many internal 
operations to continue without waiting for write cycles to be completed on the processor 
bus. 


¢ Bus Cycles and Bus Control — A large selection of bus cycles and control functions are 
supported, including burst transfers, non-burst transfers (single- and multiple-cycle), bus 
arbitration (bus request, bus hold, bus hold acknowledge, bus locking, bus pseudo-locking, 
and bus backoff), floating-point error signalling, interrupts, and reset. Two software- 
controlled outputs enable page caching on a cycle-by-cycle basis. One input and one output 
are provided for controlling burst read transfers. 


¢ Parity Generation and Control — Even parity is generated on writes to the processor and 
checked on reads. An error signal indicates a read parity error. 


¢ Cache Control — Cache control and consistency operations are supported. Four inputs 
(KEN#, Flush#, PCD, and PWT) allow the external system to control the consistency of 
data stored in the internal cache unit. Two special bus cycles allow the processor to control 
the consistency of external cache. 


NOTE 


The PWT and PCD bits function differently on the Write-back Enhanced 
IntelDX4 processor. See Section 7.6.1, Write-Back Enhanced IntelDX4™ 
Processor and Processor Page Cacheability (pg. 7-11) for more information. 
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3.1.3 Data Transfers 


To support the cache, the BIU reads 16-byte cacheable transfers of operands, instructions, and 
other data on the processor bus and passes them to the cache unit. When cache contents are up- 
dated from an internal source, such as a register, the BIU writes the updated cache information to 
the external system. Non-cacheable read transfers are passed through the cache to the integer or 
floating-point units. 


During instruction prefetch, the BIU reads instructions on the processor bus and passes them to 
both the instruction prefetch unit and the cache. The instruction prefetch unit may then obtain its 
inputs directly from the cache. 


3.1.4 Write Buffers 


The BIU has temporary storage for buffering up to four 32-bit write transfers to memory. Ad- 
dresses, data, or control information can be buffered. Single I/O-mapped writes are not buffered, 
although multiple I/O writes may be buffered. The buffers can accept memory writes as quickly 
as one per clock. Once a write request is buffered, the internal unit that generated the request is 
free to continue processing. If no higher-priority request is pending and the bus is free, the trans- 
fer is propagated as an immediate write cycle to the processor bus. When all four write buffers 
are full, any subsequent write transfer stalls inside the processor until a write buffer becomes 
available. 


The BIU can re-order pending reads in front of buffered writes. This is done because pending 
reads can prevent an internal unit from continuing, whereas buffered writes need not have a det- 
rimental effect on processing speed. 


Writes are propagated to the processor bus in the first-in-first-out order in which they are received 
from the internal unit. However, a subsequently generated read request (data or instruction) may 
be re-ordered in front of buffered writes. As a protection against reading invalid data, this re-or- 
dering of reads in front of buffered writes occurs only if all buffered writes are cache hits. Be- 
cause an external read is generated only for a cache miss, and is re-ordered in front of buffered 
writes only if all such buffered writes are cache hits, any read generated on the external bus with 
this protection never reads a location that is about to be written by a buffered write. This re-or- 
dering can only happen once for a given set of buffered writes, because the data returned by the 
read cycle could otherwise replace data about to be written from the write buffers. 


To ensure that no more than one such re-ordering is done for a given set of buffered writes, all 
buffered writes are re-flagged as cache misses when a read request is re-ordered ahead of them. 
Buffered writes thus marked are propagated to the processor bus before the next read request is 
acted upon. Invalidation of data in the internal cache also causes all pending writes to be flagged 
as cache misses. Disabling the cache unit disables the write buffers, which eliminates any possi- 
bility of re-ordering bus cycles. 
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3.1.5 Locked Cycles 


The processor can generate signals to lock a contiguous series of bus cycles. These cycles can 
then be performed without interference from other bus masters, if external logic observes these 
lock signals. One example of a locked operation is a semaphor read-modify-write update, where 
a resource control register is updated. No other operations should be allowed on the bus until the 
entire locked semaphor update is completed. 


When a locked read cycle is generated, the internal cache is not read. All pending writes in the 
buffer are completed first. Only then is the read part of the locked operation performed, the data 
modified, the result placed in a write buffer, and a write cycle performed on the processor bus. 
This sequence of operations ensures that all writes are performed in the order in which they were 
generated. 


3.1.6 I/O Transfers 
Transfers to and from I/O locations have some restrictions to ensure data integrity: 
¢ Caching — I/O reads are never cached. 


¢ Read Re-ordering — I/O reads are never re-ordered ahead of buffered writes to memory. 
This ensures that the processor has completed updating all memory locations before reading 
status from a device. 


¢ Writes — Single I/O writes are never buffered. When processing an OUT instruction, 
internal execution stops until all buffered writes and the I/O write are completed on the 
processor bus. This allows time for external logic to drive a cache invalidate cycle or mask 
interrupts before the processor executes the next instruction. The processor completes 
updating all memory locations before writing to the I/O location. Repeated OUT 
instructions may be buffered. 


The write buffers and the cache unit determine I/O device recovery time. In the Intel386 proces- 
sor, back-to-back write recovery time could be guaranteed to exceed a certain value by inserting 
a jump to the next instruction that writes to the I/O device. This forced an instruction prefetch 
cycle that could only be performed after the preceding write was completed. This technique is not 
used in the Intel486 processor because a prefetch can be satisfied internally by the cache and re- 
covery time may be too short. The same effect is achieved in the Intel486 processor by explicitly 
generating a read to an area of memory that is not cacheable. Because the Intel486 processor does 
not buffer single I/O writes, such a read is not done until the I/O write is completed. 
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3.2. CACHE UNIT 


The cache unit stores copies of recently read instructions, operands, and other data. When the pro- 
cessor requests information already in the cache, called a cache hit, no processor-bus cycle is re- 
quired. When the processor requests information not in the cache, called a cache miss, the 
information is read into the cache in one or more 16-byte cacheable data transfers, called cache 
line fills. An internal write request to an area currently in the cache causes two distinct actions if 
the cache is using a write-through policy: the cache is updated, and the write is also passed 
through the cache to memory. If the cache is using a write-back policy, then the internal write 
request only causes the cache to be updated and the write is stored for future main memory up- 
dating. 


The cache transfers data to other units on two 32-bit buses, as shown in Figure 3-1. The cache 
receives linear addresses on a 32-bit bus and the corresponding physical addresses on a 20-bit 
bus. The cache and instruction prefetch units are closely coupled. 16-Byte blocks of instructions 
in the cache can be passed quickly to the instruction prefetch unit. Both units read information in 
16-byte blocks. 


The cache can be accessed as often as once each clock. The cache acts on physical addresses, 
which minimizes the number of times the cache must be flushed. When both the cache and the 
cache write-through functions are disabled, the cache may be used as a high-speed RAM. 


3.2.1. Cache Structure 


The cache has a four-way set associative organization. There are four possible cache locations to 
store data from a given area of memory. Four-way association is a compromise between the speed 
of a direct-mapped cache during cache hits and the high cache-hit ratio of a fully associative 
cache. As shown in Figure 3-5, the 8-Kbyte data block is divided into four data ways, each con- 
taining 128 16-byte sets, or cache lines (the DX4 processor has 256 16-byte sets). Each cache line 
holds data from 16 successive byte addresses in memory, beginning with an address divisible 
by 16. 
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Cache addressing is performed by dividing the high-order 28 bits of the physical address into 
three parts, as shown in Figure 3-5. The 7 bits of the index field specify the set number, one of 
128, within the cache. The high-order 21 bits (20 on the IntelDX4 processor) are the tag field; 
these bits are compared with tags for each cache line in the indexed set, and they indicate whether 
a 16-byte cache line is stored for that physical address. The low-order 4 bits of the physical ad- 
dress select the byte within the cache line. Finally, a 4-bit valid field, one for each way within a 
given set, indicates whether the cached data at that physical address is currently valid. 
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3.2.2 Cache Updating 


When a cache miss occurs on a read, the 16-byte block containing the requested information is 
written into the cache. Data in the neighborhood of the required data is also read into the cache, 
but the exact position of data within the cache line depends on its location in memory with respect 
to addresses divisible by 16. 


Any area of memory can be cacheable, but any page of memory can be declared not cacheable 
by setting a bit in its page table entry. The I/O region of memory is non-cacheable. When a read 
from memory is initiated on the bus, external logic can indicate whether the data may be placed 
in cache, as discussed in Chapter 10, “Bus Operation.” If the read is cacheable, the processor at- 
tempts to read an entire 16-byte cache line. 


The cache unit follows a write-through cache policy. The unit on the IntelDX4 processor can be 
configured to be a write-through or write-back cache. Cache line fills are performed only for read 
misses, never for write misses. When the processor is enabled for normal caching and write- 
through operation, every internal write to the cache (cache hit) not only updates the cache but is 
also passed along to the BIU and propagated through the processor bus to memory. The only con- 
ditions under which data in the cache differs from the corresponding data in memory occur when 
a processor write cycle to memory is delayed by buffering in the BIU, or when an external bus 
master alters the memory area mapped to the internal cache. When the IntelDX4 processor is en- 
abled for normal caching and write-back operation, an internal write only causes the cache to be 
updated. The modified data is stored for the future update of main memory and is not immediately 
written to memory. 


3.2.3. Cache Replacement 


Replacement in the cache is handled by a pseudo-LRU (least recently used) mechanism. This 
mechanism maintains three bits for each set in the valid/LRU block, as shown in Figure 3-5. The 
LRU bits are updated on each cache hit or cache line fill. Each cache line (four per set) also has 
an associated valid bit that indicates whether the line contains valid data. When the cache is 
flushed or the processor is reset, all of the valid bits are cleared. When a cache line is to be filled, 
a location for the fill is selected by simply finding any cache line that is invalid. If no cache line 
is invalid, the LRU bits select the line to be overwritten. Valid bits are not set for lines that are 
only partially valid. 


Cache lines can be invalidated individually by a cache line invalidation operation on the proces- 
sor bus. When such an operation is initiated, the cache unit compares the address to be invalidated 
with tags for the lines currently in cache and clears the valid bit if a match is found. A cache flush 
operation is also available. This invalidates the entire contents of the internal cache unit. 


3.2.4 Cache Configuration 


Configuration of the cache unit is controlled by two bits in the processor’s machine status register 
(CRO). One of these bits enables caching (cache line fills). The other bit enables memory write- 
through. Table 3-2 shows the four configuration options. Chapter 10, “Bus Operation,” gives de- 
tails. 
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Table 3-2. Cache Configuration Options 


Write-through 


Cache Enabled Enabled 


Operating Mode 


no no Cache line fills, cache write-throughs, and cache invalidations are 
disabled. This configuration allows the internal cache to be used as 
high-speed static RAM. 


no yes Cache line fills are disabled, and cache write-throughs and cache 
invalidations are enabled. This configuration allows software to 
disable the cache for a short time, then re-enable it without flushing 
the original contents. 


yes no INVALID 


yes yes Cache line fills, cache write-throughs, and cache invalidations are 
enabled. This is the normal operating configuration. 


When caching is enabled, memory reads and instruction prefetches are cacheable. These transfers 
are cached if external logic asserts the cache enable input in that bus cycle, and if the current page 
table entry allows caching. During cycles in which caching is disabled, cache lines are not filled 
on cache misses. However, the cache remains active even though it is disabled for further filling. 
Data already in the cache is used if it is still valid. When all data in the cache is flagged invalid, 
as happens in a cache flush, all internal read requests are propagated as bus cycles to the external 
system. 


When cache write-through is enabled, all writes, including those that are cache hits, are written 
through to memory. Invalidation operations remove a line from cache if the invalidate address 
maps to a cache line. When cache write-throughs are disabled, an internal write request that is a 
cache hit does not cause a write-through to memory, and cache invalidation operations are dis- 
abled. With both caching and cache write-through disabled, the cache can be used as a high-speed 
static RAM. In this configuration, the only write cycles that are propagated to the processor bus 
are cache misses, and cache invalidation operations are ignored. 


The IntelDX4 processor can also be configured to use a write-back cache policy. For detailed in- 
formation on the Intel486 processor cache feature, and on the Write-Back Enhanced IntelDX4 
processor, refer to Chapter 7, “On-Chip Cache.” 
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3.3. INSTRUCTION PREFETCH UNIT 


When the BIU is not performing bus cycles to execute an instruction, the instruction prefetch unit 
uses the BIU to prefetch instructions. By reading instructions before they are needed, the proces- 
sor rarely needs to wait for an instruction prefetch cycle on the processor bus. 


Instruction prefetch cycles read 16-byte blocks of instructions, starting at addresses numerically 
greater than the last-fetched instruction. The prefetch unit, which has a direct connection (not 
shown in Figure 3-1) to the paging unit, generates the starting address. The 16-byte prefetched 
blocks are read into both the prefetch and cache units simultaneously. The prefetch queue in the 
prefetch unit stores 32 bytes of instructions. As each instruction is fetched from the queue, the 
code part is sent to the instruction decode unit and (depending on the instruction) the displace- 
ment part is sent to the segmentation unit, where it is used for address calculation. If loops are 
encountered in the program being executed, the prefetch unit gets copies of previously executed 
instructions from the cache. 


The prefetch unit has the lowest priority for processor bus access. Assuming zero wait-state mem- 
ory access, prefetch activity never delays execution. However, if there is no pending data transfer, 
prefetching may use bus cycles that would otherwise be idle. The prefetch unit is flushed when- 
ever the next instruction needed is not in numerical sequence with the previous instruction; for 
example, during jumps, task switches, exceptions, and interrupts. 


The prefetch unit never accesses beyond the end of a code segment and it never accesses a page 
that is not present. However, prefetching may cause problems for some hardware mechanisms. 
For example, prefetching may cause an interrupt when program execution nears the end of mem- 
ory. To keep prefetching from reading past a given address, instructions should come no closer 
to that address than one byte plus one aligned 16-byte block. 


3.4 INSTRUCTION DECODE UNIT 


The instruction decode unit receives instructions from the instruction prefetch unit and translates 
them in a two-stage process into low-level control signals and microcode entry points, as shown 
in Figure 3-1. Most instructions can be decoded at a rate of one per clock. Stage 1 of the decode, 
shown in Figure 3-8, initiates a memory access. This allows execution of a two-instruction se- 
quence that loads and operates on data in just two clocks, as described in Section 3.2. 


The decode unit simultaneously processes instruction prefix bytes, opcodes, modR/M bytes, and 
displacements. The outputs include hardwired microinstructions to the segmentation, integer, and 
floating-point units. The instruction decode unit is flushed whenever the instruction prefetch unit 
is flushed. 
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3.5 CONTROL UNIT 


The control unit interprets the instruction word and microcode entry points received from the in- 
struction decode unit. The control unit has outputs with which it controls the integer and floating- 
point processing units. It also controls segmentation because segment selection may be specified 
by instructions. 


The control unit contains the processor’s microcode. Many instructions have only one line of mi- 
crocode, so they can execute in an average of one clock cycle. Figure 3-8 shows how execution 
fits into the internal pipelining mechanism. 


3.6 INTEGER (DATAPATH) UNIT 


The integer and datapath unit identifies where data is stored and performs all of the arithmetic 
and logical operations available in the Intel386 processor’s instruction set, plus a few new instruc- 
tions. It has eight 32-bit general-purpose registers, several specialized registers, an ALU, and a 
barrel shifter. Single load, store, addition, subtraction, logic, and shift instructions execute in one 
clock. 


Two 32-bit bidirectional buses connect the integer and floating-point units. These buses are used 
together for transferring 64-bit operands. The same buses also connect the processing units with 
the cache unit. The contents of the general purpose registers are sent to the segmentation unit on 
a separate 32-bit bus for generation of effective addresses. 


3.7. FLOATING-POINT UNIT 


The floating-point unit executes the same instruction set as the 387 math coprocessor. The unit 
contains a push-down register stack and dedicated hardware for interpreting the 32-, 64-, and 80- 
bit formats as specified in IEEE Standard 754. An output signal passed through to the processor 
bus indicates floating-point errors to the external system, which in turn can assert an input to the 
processor indicating that the processor should ignore these errors and continue normal operations. 


3.7.1. IntelDX2™ and IntelDX4™ Processor On-Chip Floating-Point Unit 


The IntelDX2 and IntelDX4 processors incorporate the basic Intel486 processor 32-bit architec- 
ture, with on-chip memory management and cache memory units. They also have an on-chip 
floating-point unit (FPU) that operates in parallel with the arithmetic and logic unit. The FPU pro- 
vides arithmetic instructions for a variety of numeric data types and executes numerous built-in 
transcendental functions (e.g., tangent, sine, cosine, and log functions). The floating-point unit 
fully conforms to the ANSI/IEEE standard 754-1985 for floating-point arithmetic. 


All software written for the Intel1386 processor, Intel387 math coprocessor and previous members 
of the 86/87 architectural family runs on these processors without modifications. 


intel. ARCHITECTURAL OVERVIEW 


3.8 SEGMENTATION UNIT 


A segment is a protected, independent address space. Segmentation is used to enforce isolation 
among application programs, to invoke recovery procedures, and to isolate the effects of pro- 
gramming errors. 


The segmentation unit translates a segmented address issued by a program, called a logical ad- 
dress, into an unsegmented address, called a linear address. The locations of segments in the lin- 
ear address space are stored in data structures called segment descriptors. The segmentation unit 
performs its address calculations using segment descriptors and displacements (offsets) extracted 
from instructions. Linear addresses are sent to the paging and cache units. When a segment is ac- 
cessed for the first time, its segment descriptor is copied into a processor register. A program can 
have as many as 16,383 segments. Up to six segment descriptors can be held in processor regis- 
ters at a time. Figure 3-6 shows the relationships between logical, linear, and physical addresses. 
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Figure 3-6. Segmentation and Paging Address Formats 
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3.9 PAGING UNIT 


The paging unit allows access to data structures larger than the available memory space by keep- 
ing them partly in memory and partly on disk. Paging divides the linear address space into 
4-Kbyte blocks called pages. Paging uses data structures in memory called page tables for map- 
ping a linear address to a physical address. The cache uses physical addresses and puts them on 
the processor bus. The paging unit also identifies problems, such as accesses to a page that is not 
resident in memory, and raises exceptions called page faults. When a page fault occurs, the oper- 
ating system has a chance to bring the required page into memory from disk. If necessary, it can 
free space in memory by sending another page out to disk. If paging is not enabled, the physical 
address is identical to the linear address. 


The paging unit includes a translation lookaside buffer (TLB) that stores the 32 most recently 
used page table entries. Figure 3-7 shows the TLB data structures. The paging unit looks up linear 
addresses in the TLB. If the paging unit does not find a linear address in the TLB, the unit gen- 
erates requests to fill the TLB with the correct physical address contained in a page table in mem- 
ory. Only when the correct page table entry is in the TLB does the bus cycle take place. When the 
paging unit maps a page in the linear address space to a page in physical memory, it maps only 
the upper 20 bits of the linear address. The lowest 12 bits of the physical address come unchanged 
from the linear address. 
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Figure 3-7. Translation Lookaside Buffer 
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Most programs access only a small number of pages during any short span of time. When this is 
true, the pages stay in memory and the address translation information stays in the TLB. In typical 
systems, the TLB satisfies 99% of the requests to access the page tables. The TLB uses a pseudo- 
LRU algorithm, similar to the cache, as a content-replacement strategy. 


The TLB is flushed whenever the page directory base register (CR3) is loaded. Page faults can 
occur during either a page directory read or a page table read. The cache can be used to supply 
data for the TLB, although this may not be desirable when external logic monitors TLB updates. 


Unlike segmentation, paging is invisible to application programs and does not provide the same 
kind of protection against programs altering data outside a restricted part of memory. Paging is 
visible to the operating system, which uses it to satisfy application program memory require- 
ments. For more information on paging and segmentation, see the Intel486™ Processor Family 
Programmer's Reference Manual. 


3.9.1. Instruction Pipelining 


Not every instruction involves all internal units. When an instruction needs the participation of 
several units, each unit operates in parallel with others on instructions at different stages of exe- 
cution. Although each instruction is processed sequentially, several instructions are at varying 
stages of execution in the processor at any given time. This is called instruction pipelining. In- 
struction prefetch, instruction decode, microcode execution, integer operations, floating-point 
operations, segmentation, paging, cache management, and bus interface operations are all per- 
formed simultaneously. Figure 3-8 shows some of this parallelism for a single instruction: the in- 
struction fetch, two-stage decode, execution, and register write-back of the execution result. Each 
stage in this pipeline can occur in one clock cycle. 
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Figure 3-8. Internal Pipelining 
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The internal pipelining on the Intel486 processor offers an important performance advantage over 
many single-clock RISC processors: in the Intel486 processor, data can be loaded from the cache 
with one instruction and used by the next instruction in the next clock. This performance advan- 
tage results from the stage-1 decode step, which initiates memory accesses before the execution 
cycle. Because most compilers and application programs follow load instructions with instruc- 
tions that operate on the loaded data, this method optimizes the execution of existing binary code. 


The method has a performance trade-off: an instruction sequence that changes register contents 
and then uses that register in the next instruction to access memory takes three clocks rather than 
two. This trade-off is only a minor disadvantage, however, since most instructions that access 
memory use the stable contents of the stack pointer or frame pointer, and the additional clock is 
not used very often. Compilers often place an unrelated instruction between one that changes an 
addressing register and one that uses the register. Such code is compatible with the Intel386 pro- 
cessor, and the Intel486 processor provides special stack increment/decrement hardware and an 
extra register port to execute back-to-back stack push/pop instructions in a single clock. 


3.10 SYSTEM ARCHITECTURE 


The Intel486 processor can be the foundation for single-processor or multi-processor embedded 
systems. A single-processor system might be an embedded personal computer designed to use the 
Intel486 processor. A system design of this type offers higher performance through the integra- 
tion of floating-point processing, memory management, and caching. More complex embedded 
systems may use multiple processors that provide, at chip-level, the equivalent of board-level 
functions. Designs of this type are typically used in multi-user machines, scientific workstations, 
and engineering workstations. 


A typical Intel486 design is shown in Figure 3-9. This example uses a single Intel486 processor 
with external cache. Other examples of system design are illustrated in the figures that follow. 
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Figure 3-9. A Typical Intel486™ Processor System 


3.10.1 Single Processor System 


In single-processor systems, the processor handles all peripheral resources and intelligent devic- 
es, and executes all software. The Intel486 processor does this in a more efficient way and for a 
wider range of task complexity than earlier processors. Single-processor systems offer small size 
and low cost in exchange for flexibility in upgrading or expanding the system. Typical applica- 
tions include personal computers, small desktop workstations, and embedded controllers. Such 
applications are implemented as a single board, usually called a motherboard; the processor bus 
does not extend beyond the board occupied by the Intel486 processor. 
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Figure 3-10 shows an example of such a system. In a single-processor system, devices that share 
the processor bus must be selected carefully. All components must interact directly with the pro- 
cessor bus or have interface logic that allows them to do so. The total bus bandwidth requirements 
of other components should be no more than 50% of the available processor-bus bandwidth. Traf- 
fic above 50% degrades performance of the processor. 


Level-2 Intel486™ 
Cache Processor 


Processor Bus 


i uy u) 


Peripheral 


Memor 
, Controller Controller 


Figure 3-10. Single-Processor System 


Two basic design approaches are used to elaborate the single-processor system into a more com- 
plex system. The first approach is to add more devices to the processor bus. This can be done up 
to the limit mentioned above: no more than 50% of the processor-bus bandwidth should be used 
by devices other than the Intel486 processor. The second design approach is to add more buses 
to the system. By adding buses, greater bus bandwidth is created in the system as a whole, which 
in turn allows more devices to be added to the system. The two approaches go hand-in-hand to 
expand the capabilities of a system. The sections below give only a few examples of the great 
variety of designs that are possible with Intel486 processor-compatible devices. 


3.10.2 Loosely Coupled Multi-Processor System 


Loosely coupled multi-processor systems include board-level products that communicate with 
one another through a standard system bus. In this architecture, each board contains a processor 
and associated logic. There is typically only one processor per board. Components within each 
board communicate on either a processor bus or on the buffered system bus. The system bus usu- 
ally provides extra bandwidth beyond the processor bus. 
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A typical system is shown in Figure 3-11. Such system-bus boards typically occur in higher-end 
personal computers and embedded systems that allow for modular expansion. A typical design 
would include a coprocessor or LAN interface board in a personal computer, or a network-inter- 
face board in a file server or gateway. Systems built from these boards can contain a mix of pro- 
cessor types. Devices attached to the processor bus on a given board make demands that may 
affect system performance. For example, a typical system may use up to 3% of the bus bandwidth 
to handle 10-Mbit/second Ethernet traffic. 
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Figure 3-11. Loosely Coupled Multi-processor System 


3.10.3 System Components 


Intel offers several chips that are highly compatible with the Intel486 processor. These compo- 
nents can be used to design high-performance embedded systems with a minimum of effort and 
cost. For components not directly connectable to the Intel486 processor bus, industry-standard 
interfaces can be used. 


The Intel486 processor provides all integer and floating-point CPU functions plus many of the 
peripheral functions required in a typical computer system. It executes the complete instruction 
set of the Intel386 processor and Intel387 DX numerics coprocessor, with some extensions. The 
processor eliminates the need for an external memory management unit, and the on-chip cache 
minimizes the need for external cache and associated control logic. 


The remaining chapters of this manual detail the Intel486 processor’s architecture, hardware 
functions, and interfacing. For more information on the architecture and software interface, see 
the Intel486™ Processor Programmer’s Reference Manual. 
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3.10.4 External Cache 


External cache allows a system to achieve maximum performance. This cache is essential in 
tightly coupled multi-processor embedded systems. The external cache consists of cache memory 
(usually fast SRAM) and cache control logic. 


External cache systems typically provide access to the cache from both the processor and the sys- 
tem buses. This is shown in Figure 3-12. These caches typically monitor processor memory ac- 
cesses, processor access time, and consistency between cache and memory. The cache controller 
is responsible for maintaining an optimal mix of data and instructions in cache. 
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Figure 3-12. External Cache 
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3.11 MEMORY ORGANIZATION 


Memory on the Intel486 processor is divided up into 8-bit quantities (bytes), 16-bit quantities 
(words), and 32-bit quantities (dwords). Words are stored in two consecutive bytes in memory 
with the low-order byte at the lowest address, the high order byte at the high address. Dwords are 
stored in four consecutive bytes in memory with the low-order byte at the lowest address, the 
high-order byte at the highest address. The address of a word or dword is the byte address of the 
low-order byte. 


In addition to these basic data types, the Intel486 processor supports two larger units of memory: 
pages and segments. Memory can be divided up into one or more variable-length segments, 
which can be swapped to disk or shared between programs. Memory can also be organized into 
one or more 4-Kbyte pages. Both segmentation and paging can be combined, gaining the advan- 
tages of both systems. The Intel486 processor supports both pages and segments in order to pro- 
vide maximum flexibility to the system designer. Segmentation and paging are complementary. 
Segmentation is useful for organizing memory in logical modules, and as such is a tool for the 
application programmer, while pages are useful for the system programmer for managing the 
physical memory of a system. 


3.11.1 Address Spaces 


The Intel486 processor has three distinct address spaces: logical, linear, and physical. A logical 
address (also known as a virtual address) consists of a selector and an offset. A selector is the con- 
tents of a segment register. An offset is formed by summing all of the addressing components 
(BASE, INDEX, DISPLACEMENT) discussed in Section 3.13.3, “32-BIT MEMORY AD- 
DRESSING MODES,” into an effective address. Because each task on the Intel486 processor has 
a maximum of 16 K ao" - 1) selectors, and offsets can be 4 Gbytes OQ” bits), this gives a total of 
246 bits or 64 terabytes of logical address space per task. The programmer sees this virtual address 
space. 


The segmentation unit translates the logical address space into a 32-bit linear address space. If 
the paging unit is not enabled then the 32-bit linear address corresponds to the physical address. 
The paging unit translates the linear address space into the physical address space. The physical 
address is what appears on the address pins. 


The primary difference between Real Mode and Protected Mode is how the segmentation unit 
performs the translation of the logical address into the linear address. In Real Mode, the segmen- 
tation unit shifts the selector left four bits and adds the result to the offset to form the linear ad- 
dress. While in Protected Mode every selector has a linear base address associated with it. The 
linear base address is stored in one of two operating system tables (i.e., the Local Descriptor Ta- 
ble or Global Descriptor Table). The selector's linear base address is added to the offset to form 
the final linear address. 


Figure 3-13 shows the relationship between the various address spaces. 
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Figure 3-13. Address Translation 


3.11.2 Segment Register Usage 


The main data structure used to organize memory is the segment. On the Intel486 processor, seg- 
ments are variable sized blocks of linear addresses which have certain attributes associated with 
them. There are two main types of segments: code and data. The segments are of variable size 
and can be as small as | byte or as large as 4 Gbytes hs bytes). 


In order to provide compact instruction encoding, and increase Intel486 processor performance, 
instructions do not need to explicitly specify which segment register is used. A default segment 
register is automatically chosen according to the rules of Table 3-3. In general, data references 
use the selector contained in the DS register; stack references use the SS register and Instruction 
fetches use the CS register. The contents of the Instruction Pointer provide the offset. Special seg- 
ment override prefixes allow the explicit use of a given segment register, and override the implicit 
rules listed in Table 3-3. The override prefixes also allow the use of the ES, FS and GS segment 
registers. 


There are no restrictions regarding the overlapping of the base addresses of any segments. Thus, 
all 6 segments could have the base address set to zero and create a system with a 4-Gbyte linear 
address space. This creates a system where the virtual address space is the same as the linear ad- 
dress space. Further details of segmentation are discussed in Chapter 6, “Protected Mode Archi- 
tecture.” 
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3.12 I/O SPACE 


The Intel486 processor has two distinct physical address spaces: Memory and I/O. Generally, pe- 
ripherals are placed in I/O space, although the Intel486 processor also supports memory-mapped 
peripherals. The I/O space consists of 64 Kbytes. It can be divided into 64 K 8-bit ports, 32 K 16- 
bit ports, or 16 K 32-bit ports, or any combination of ports that add up to less than 64 Kbytes. The 
64 K I/O address space refers to physical memory rather than linear address, because I/O instruc- 
tions do not go through the segmentation or paging hardware. The M/IO# pin acts as an additional 
address line, thus allowing the system designer to easily determine which address space the pro- 
cessor is accessing. 


Table 3-3. Segment Register Selection Rules 


Type of Memory Reference peered [eos 
Code Fetch cs None 
Destination of PUSH, PUSHF, INT, CALL, PUSHA Ss None 
Instructions 

Source of POP, POPA, POPF, IRET, RET instructions Ss None 
Destination of STOS, MOVS, REP STOS, REP MOVS ES None 
Instructions (DI is Base Register) 

Other Data References, with Effective Address using 

Base Register of: 

[EAX] DS 

[EBX] DS 

[ECX] DS 

[EDX] DS All 
[ESI] DS 

[EDI] DS 

[EBP] ss 

[ESP] ss 


The I/O ports are accessed via the IN and OUT I/O instructions, with the port address supplied as 
an immediate 8-bit constant in the instruction or in the DX register. All 8- and 16-bit port address- 
es are zero extended on the upper address lines. The I/O instructions cause the M/IO# pin to be 
driven low. 


I/O port addresses OOF8H through OOFFH are reserved for use by Intel. 
I/O instruction code is cacheable. 
I/O data is not cacheable. 


I/O transfers (data or code) can be bursted. 
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3.13 ADDRESSING MODES 


3.13.1 Addressing Modes Overview 


The Intel486 processor provides a total of 11 addressing modes for instructions to specify oper- 
ands. The addressing modes are optimized to allow the efficient execution of high-level languag- 
es such as C and FORTRAN, and they cover the vast majority of data references needed by high- 
level languages. 


3.13.2 Register and Immediate Modes 


The following two addressing modes provide for instructions that operate on register or immedi- 
ate operands: 


¢ Register Operand Mode: The operand is located in one of the 8-, 16- or 32-bit general 
registers. 


¢ Immediate Operand Mode: The operand is included in the instruction as part of the opcode. 


3.13.3 32-Bit Memory Addressing Modes 


The remaining modes provide a mechanism for specifying the effective address of an operand. 
The linear address consists of two components: the segment base address and an effective ad- 
dress. The effective address is calculated by using combinations of the following four address el- 
ements: 


¢ DISPLACEMENT: An 8-, or 32-bit immediate value, following the instruction. 


¢ BASE: The contents of any general purpose register. The base registers are generally used 
by compilers to point to the start of the local variable area. 


¢ INDEX: The contents of any general purpose register except for ESP. The index registers 
are used to access the elements of an array, or a string of characters. 


¢ SCALE: The index register’s value can be multiplied by a scale factor, either 1, 2, 4 or 8. 
Scaled index mode is especially useful for accessing arrays or structures. 


Combinations of these 4 components make up the 9 additional addressing modes. There is no per- 
formance penalty for using any of these addressing combinations, because the effective address 
calculation is pipelined with the execution of other instructions. The one exception is the simul- 
taneous use of Base and Index components, which requires one additional clock. 


As shown in Figure 3-14, the effective address (EA) of an operand is calculated according to the 


following formula: 
EA = Base Reg + (Index Reg * Scaling) + Displacement 


Direct Mode: The operand’s offset is contained as part of the instruction as an 8-, 16- or 32-bit 
displacement. 


Example: INC Word PTR [500] 
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Register Indirect Mode: A BASE register contains the address of the operand. 
Example: MOV [ECX], EDX 
Based Mode: A BASE register’s contents is added to a DISPLACEMENT to form the operand’s 
offset. 
Example: MOV ECX, [EAX+24] 
Index Mode: An INDEX register’s contents is added to a DISPLACEMENT to form the oper- 
and's offset. 
Example: ADD EAX, TABLE[EST] 
Scaled Index Mode: An INDEX register's contents is multiplied by a scaling factor which is add- 
ed to a DISPLACEMENT to form the operand's offset. 
Example: IMUL EBX, TABLE[ESI*4],7 
Based Index Mode: The contents of a BASE register is added to the contents of an INDEX reg- 
ister to form the effective address of an operand. 
Example: MOV EAX, [ESI] [EBX] 
Based Scaled Index Mode: The contents of an INDEX register is multiplied by a SCALING fac- 
tor and the result is added to the contents of a BASE register to obtain the operand's offset. 
Example: MOV ECX, [EDX*8] [EAX] 
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Figure 3-14. Addressing Mode Calculations 


Based Index Mode with Displacement: The contents of an INDEX Register and a BASE reg- 
ister’s contents and a DISPLACEMENT are all summed together to form the operand offset. 


Example: ADD EDX, [ESI] [EBP+OOFFFFFOH] 


Based Scaled Index Mode with Displacement: The contents of an INDEX register are multi- 
plied by a SCALING factor, the result is added to the contents of a BASE register and a DIS- 


PLACEMENT to form the operand’s offset. 


Example: MOV EAX, LOCALTABLE[EDI*4] [EBP+80] 
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3.13.4 Differences Between 16- and 32-Bit Addresses 


In order to provide software compatibility with 80286 and 8086 processors, the Intel486 proces- 
sor can execute 16-bit instructions in Real and Protected Modes. The processor determines the 
size of the instructions it is executing by examining the D bit in the CS segment Descriptor. If the 
D bit is 0 then all operand lengths and effective addresses are assumed to be 16 bits long. If the 
D bit is 1 then the default length for operands and addresses is 32 bits. In Real Mode the default 
size for operands and addresses is 16-bits. 


Regardless of the default precision of the operands or addresses, the Intel486 processor is able to 
execute either 16- or 32-bit instructions. This is specified via the use of override prefixes. Two 
prefixes, the Operand Size Prefix and the Address Length Prefix, override the value of the D bit 
on an individual instruction basis. These prefixes are automatically added by Intel assemblers. 


Example: The Intel486 processor is executing in Real Mode and the programmer needs to access 
the EAX registers. The assembler code for this might be MOV EAX, 32-bit MEMORY OP. The 
ASM486 Macro Assembler automatically determines that an Operand Size Prefix is needed and 
generates it. 


Example: The D bit is 0, and the programmer wishes to use Scaled Index addressing mode to ac- 
cess an array. The Address Length Prefix allows the use of MOV DX, TABLE[ESI*2]. The as- 
sembler uses an Address Length Prefix because, with D=0, the default addressing mode is 16- 
bits. 


Example: The D bit is 1, and the program wants to store a 16-bit quantity. The Operand Length 
Prefix is used to specify only a 16-bit value; MOV MEM16, DX. 


The OPERAND LENGTH and Address Length Prefixes can be applied separately or in combi- 
nation to any instruction. The Address Length Prefix does not allow addresses over 64 Kbytes to 
be accessed in Real Mode. A memory address which exceeds FFFFH will result in a General Pro- 
tection Fault. An Address Length Prefix only allows the use of the additional Intel486 processor 
addressing modes. 


When executing 32-bit code, the Intel486 processor uses either 8-, or 32-bit displacements, and 
any register can be used as base or index registers. When executing 16-bit code, the displace- 
ments are either 8, or 16 bits, and the base and index register conform to the 80286 processor 
model. Table 3-4 illustrates the differences. 


Table 3-4. BASE and INDEX Registers for 16- and 32-Bit Addresses 


16-Bit Addressing 32-Bit Addressing 


BASE REGISTER BX,BP Any 32-bit GP Register 


INDEX SI,DI Any 32-bit GP Register 
REGISTER Except ESP 


SCALE FACTOR none 1,2,4,8 
DISPLACEMENT 0, 8, 16 bits 0, 8, 32 bits 


| 3-29 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


3.14 Data Formats 


3.14.1 Data Types 


The Intel486 processor can support a wide-variety of data types. In the following descriptions, 
the processor consists of the base architecture registers. 


3.14.1.1 Unsigned Data Types 

Byte: Unsigned 8-bit quantity 

Word: Unsigned 16-bit quantity 

Dword: Unsigned 32-bit quantity 

The least significant bit (LSB) in a byte is bit 0, and the most significant bit is 7. 


3.14.1.2 Signed Data Types 


All signed data types assume 2’s complement notation. The signed data types contain two fields, 
a sign bit and a magnitude. The sign bit is the most significant bit (MSB). The number is negative 
if the sign bit is 1. If the sign bit is 0, the number is positive. The magnitude field consists of the 
remaining bits in the number. (Refer to Figure 3-15.) 


8-bit Integer: Signed 8-bit quantity 

16-bit Integer: Signed 16-bit quantity 
32-bit Integer: Signed 32-bit quantity 
64-bit Integer: Signed 64-bit quantity 


The integer core of the Intel486 processors only support 8-, 16- and 32-bit integers. (See Section 
3.14.1.4, “Floating-Point Data Types’) 


3.14.1.3. BCD Data Types 


The Intel486 processor supports packed and unpacked binary coded decimal (BCD) data types. 
A packed BCD data type contains two digits per byte, the lower digit is in bits 3:0 and the upper 
digit in bits 7:4. An unpacked BCD data type contains 1 digit per byte stored in bits 3:0. 


The Intel486 processor supports 8-bit packed and unpacked BCD data types. (Refer to Figure 
3-15.) 
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3.14.1.4 Floating-Point Data Types 


In addition to the base registers, the IntelIDX2™ and IntelDX4™ processors’ on-chip floating- 
point unit consists of the floating-point registers. The floating-point unit data type contain three 
fields: sign, significand, and exponent. The sign field is one bit and is the MSB of the floating- 
point number. The number is negative if the sign bit is 1. If the sign bit is 0, the number is positive. 
The significand gives the significant bits of the number. The exponent field contains the power 
of 2 needed to scale the significand. (Refer to Figure 3-15.) 


Only the FPU supports floating-point data types. 


Single Precision Real: 23-bit significand and 8-bit exponent. 32 bits total. 
Double Precision Real: 52-bit significand and 11-bit exponent. 64 bits total. 
Extended Precision Real: 64-bit significand and 15-bit exponent. 80 bits total. 


Floating-Point Unsigned Data Types 
The on-chip FPU does not support unsigned data types. (Refer to Figure 3-15.) 


Floating-Point Signed Data Types 
The on-chip FPU only supports 16-, 32- and 64-bit integers. 


Floating-Point BCD Data Types 
The on-chip FPU only supports 80-bit packed BCD data types. 
3.14.1.5 String Data Types 


A string data type is a contiguous sequence of bits, bytes, words or dwords. A string may contain 
between | byte and 4 Gbytes. (Refer to Figure 3-16.) 


String data types are only supported by the CPU section of the Intel486 processor. 


Byte String: Contiguous sequence of bytes. 

Word String: Contiguous sequence of words. 

Dword String: Contiguous sequence of dwords. 

Bit String: A set of contiguous bits. In the Intel486 processor bit strings 


can be up to 4-gigabits long. 


3.14.1.6 ASCII Data Types 


The Intel486 processor supports ASCII (American Standard Code for Information Interchange) 
strings and can perform arithmetic operations (such as addition and division) on ASCII data. The 
Intel486 processor can only operate on ASCII data. (Refer to Figure 3-16.) 
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String Data Types 
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Figure 3-16. String and ASCII Data Types 
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Figure 3-17. Pointer Data Types 
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3.14.1.7 Pointer Data Types 


A pointer data type contains a value that gives the address of a piece of data. Intel486 processors 
support the following two types of pointers (see Figure 3-17): 


¢ 48-bit Pointer: 16-bit selector and 32-bit offset 
e 32-bit Pointer: 32-bit offset 


3.14.2 Little Endian vs. Big Endian Data Formats 


The Intel486 processors, as well as all other members of the Intel architecture, use the “little-en- 
dian” method for storing data types that are larger than one byte. Words are stored in two consec- 
utive bytes in memory with the low-order byte at the lowest address and the high order byte at 
the high address. Dwords are stored in four consecutive bytes in memory with the low-order byte 
at the lowest address and the high order byte at the highest address. The address of a word or 
dword data item is the byte address of the low-order byte. 


Figure 3-18 illustrates the differences between the big-endian and little-endian formats for 
dwords. The 32 bits of data are shown with the low order bit numbered bit 0 and the high order 
bit numbered 32. Big-endian data is stored with the high-order bits at the lowest addressed byte. 
Little-endian data is stored with the high-order bits in the highest addressed byte. 


The Intel486 processor has the following two instructions that can convert 16- or 32-bit data be- 
tween the two byte orderings: 


¢ BSWAP (byte swap) handles 4-byte values 
¢ XCHG (exchange) handles 2-byte values 


m+3 m+2 m+1 m 
31 24 23 1615 87 0 
Dword in Little-Endian Memory Format 
m+2 m+3 


m m+1 
31 24 23 1615 87 0 


Dword in Big-Endian Memory Format 
A5163-01 


Figure 3-18. Big vs. Little Endian Memory Format 
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3.15 INTERRUPTS 


3.15.1 Interrupts and Exceptions 


Interrupts and exceptions alter the normal program flow, in order to handle external events, to 
report errors or exceptional conditions. The difference between interrupts and exceptions is that 
interrupts are used to handle asynchronous external events while exceptions handle instruction 
faults. Although a program can generate a software interrupt via an INT N instruction, the 
Intel486 processors treat software interrupts as exceptions. 


Hardware interrupts occur as the result of an external event and are classified into two types: 
maskable or non-maskable. Interrupts are serviced after the execution of the current instruction. 
After the interrupt handler is finished servicing the interrupt, execution proceeds with the instruc- 
tion immediately after the interrupted instruction. Section 3.15.3, “Maskable Interrupt’ and Sec- 
tion 3.15.4, “Non-Maskable Interrupt” discuss the differences between Maskable and Non- 
Maskable interrupts. 


Exceptions are classified as faults, traps, or aborts, depending on the way they are reported, and 
whether or not restart of the instruction causing the exception is supported. Faults are exceptions 
that are detected and serviced before the execution of the faulting instruction. A fault would oc- 
cur in a virtual memory system when the processor referenced a page or a segment that was not 
present. The operating system would fetch the page or segment from disk, and then the Intel486 
processor would restart the instruction. Traps are exceptions that are reported immediately after 
the execution of the instruction that caused the problem. User defined interrupts are examples of 
traps. Aborts are exceptions that do not permit the precise location of the instruction causing the 
exception to be determined. Aborts are used to report severe errors, such as a hardware error or 
illegal values in system tables. 


Thus, when an interrupt service routine has been completed, execution proceeds from the instruc- 
tion immediately following the interrupted instruction. On the other hand, the return address from 
an exception fault routine will always point at the instruction causing the exception and include 
any leading instruction prefixes. Tables 3-5 and 3-6 summarize the possible interrupts for 
Intel486 processors and shows where the return address points. 


Intel486 processors can handle up to 256 different interrupts and/or exceptions. In order to ser- 
vice the interrupts, a table with up to 256 interrupt vectors must be defined. The interrupt vectors 
are simply pointers to the appropriate interrupt service routine. In Real Mode (see Chapter 5, “Re- 
al Mode Architecture”), the vectors are 4-byte quantities, a Code Segment plus a 16-bit offset; in 
Protected Mode, the interrupt vectors are 8-byte quantities, which are put in an Interrupt Descrip- 
tor Table (see Section 6.2.3.4, “Interrupt Descriptor Table”). Of the 256 possible interrupts, 32 
are reserved for use by Intel, the remaining 224 are free to be used by the system designer. 
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3.15.2 Interrupt Processing 


When an interrupt occurs, the following actions happen. First, the current program address and 
the Flags are saved on the stack to allow resumption of the interrupted program. Next, an 8-bit 
vector is supplied to the Intel486 processor which identifies the appropriate entry in the interrupt 
table. The table contains the starting address of the interrupt service routine. Then, the user sup- 
plied interrupt service routine is executed. Finally, when an IRET instruction is executed the old 
Intel486 processor state is restored and program execution resumes at the appropriate instruction. 


The 8-bit interrupt vector is supplied to the Intel486 processor in several different ways: excep- 
tions supply the interrupt vector internally; software INT instructions contain or imply the vector; 
maskable hardware interrupts supply the 8-bit vector via the interrupt acknowledge bus sequence. 
Non-maskable hardware interrupts are assigned to interrupt vector 2. 


3.15.3 Maskable Interrupt 


Maskable interrupts are the most common way used by the Intel486 processor to respond to asyn- 
chronous external hardware events. A hardware interrupt occurs when the INTR is pulled high 
and the Interrupt Flag bit (IF) is enabled. The Intel486 processor only responds to interrupts be- 
tween instructions, (REPeat String instructions, have an “interrupt window,” between memory 
moves, which allows interrupts during long string moves). When an interrupt occurs, the Intel486 
processor reads an 8-bit vector supplied by the hardware which identifies the source of the inter- 
rupt, (one of 224 user defined interrupts). The exact nature of the interrupt sequence is discussed 
in Section 10.3.10, “Interrupt Acknowledge.” 


The IF bit in the EFLAG registers is reset when an interrupt is being serviced. This effectively 
disables servicing additional interrupts during an interrupt service routine. However, the IF may 
be set explicitly by the interrupt handler, to allow the nesting of interrupts. When an IRET in- 
struction is executed, the original state of the IF is restored. 
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Table 3-5. Interrupt Vector Assignments 


Function Interrupt | Instruction that Can Cause | points to Faulting | Type 
Instruction 
Divide Error 0 DIV, IDIV YES FAULT 
Debug Exception 1 Any instruction YES TRAPt 
NMI Interrupt 2 INT 2 or NMI NO NMI 
One Byte Interrupt 3 INT NO TRAP 
Interrupt on Overflow 4 INTO NO TRAP 
Array Bounds Check 5 BOUND FAULT 
Device Not Available 7 ESC, WAIT FAULT 
Double Fault 8 Any instruction that can ABORT 
generate an exception 
Intel Reserved 9 
Invalid TSS 10 JMP, CALL, IRET, INT FAULT 
Segment Not Present 1 Segment Register Instructions FAULT 
Stack Fault 12 Stack References FAULT 
General Protection Fault 13 Any Memory Reference FAULT 
Page Fault 14 Any Memory Access or Code FAULT 
Fetch 
Intel Reserved 15 
Alignment Check Interrupt 17 Unaligned Memory Access YES FAULT 
Intel Reserved 18-31 
Two Byte Interrupt 0-255 INT n NO TRAP 


tSome debug exceptions may report both traps on the previous instruction, and faults on the next 


instruction. 


Table 3-6. FPU Interrupt Vector Assignments 


7 Interrupt Instruction Which Can Return Address Points to 
Funetion Number Cause Exception Faulting Instruction Type 
Floating-Point Error 16 Floating-point, WAIT YES FAULT 
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3.15.4 Non-Maskable Interrupt 


Non-maskable interrupts provide a method of servicing very high priority interrupts. A common 
example of the use of a non-maskable interrupt (NMI) would be to activate a power failure rou- 
tine or SMI# to activate a power saving mode. When the NMI input is pulled high, it causes an 
interrupt with an internally supplied vector value of 2. Unlike a normal hardware interrupt, no 
interrupt acknowledgment sequence is performed for an NMI. 


While executing the NMI servicing procedure, the Intel486 processor will not service further 
NMI requests until an interrupt return (IRET) instruction is executed or the processor is reset 
(RSM in the case of SMI#). If NMI occurs while currently servicing an NMI, its presence will be 
saved for servicing after executing the first IRET instruction. The IF bit is cleared at the begin- 
ning of an NMI interrupt to inhibit further INTR interrupts. 


3.15.5 Software Interrupts 


A third type of interrupt/exception for the Intel486 processor is the software interrupt. An INT n 
instruction causes the processor to execute the interrupt service routine pointed to by the nth vec- 
tor in the interrupt table. 


A special case of the two byte software interrupt INT n is the one byte INT 3, or breakpoint in- 
terrupt. By inserting this one byte instruction in a program, the user can set breakpoints in his pro- 
gram as a debugging tool. 


A final type of software interrupt is the single step interrupt. It is discussed in Section 11.2, “Sin- 
gle-Step Trap.” 


3.15.6 Interrupt and Exception Priorities 


Interrupts are externally-generated events. Maskable Interrupts (on the INTR input) and Non- 
Maskable Interrupts (on the NMI input or SMI# input) are recognized at instruction boundaries. 
When more than one interrupt or external event are both recognized at the same instruction 
boundary, the Intel486 processor invokes the highest priority routine first. (See list below.) If, af- 
ter the NMI service routine has been invoked, maskable interrupts are still enabled, then the 
Intel486 processor will invoke the appropriate interrupt service routine. 


Priority for Servicing External Events for All Intel486 processors Except the Write-Back En- 
hanced IntelDX4 processor: 


1. RESET/SRESET 


2. FLUSH# 
3. SMI# 

4. NMI 

5. INTR 

6. STPCLK# 
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NOTE 


STPCLK# will be recognized while in an interrupt service routine or an SMM 
handler. 


For the Write-Back Enhanced IntelDX4 processor, the priority of servicing external events is 
modified from the standard Intel486 processor. The list below shows the priority for write-back 
enhanced mode 


Priority for Servicing External Events for the Write-Back Enhanced IntelDX4 processor: 


1. RESET 


2. FLUSH# 
3. SRESET 
4. SMI# 

5. NMI 

6. INTR 

7. STPCLK# 


Exceptions are internally-generated events. Exceptions are detected by the Intel486 processor if, 
in the course of executing an instruction, the Intel486 processor detects a problematic condition. 
The IntelDX4 processor then immediately invokes the appropriate exception service routine. The 
state of the Intel486 processor is such that the instruction causing the exception can be restarted. 
If the exception service routine has taken care of the problematic condition, the instruction will 
execute without causing the same exception. 


It is possible for a single instruction to generate several exceptions (for example, transferring a 
single operand could generate two page faults if the operand location spans two “not present” 
pages). However, only one exception is generated upon each attempt to execute the instruction. 
Each exception service routine should correct its corresponding exception, and restart the instruc- 
tion. In this manner, exceptions are serviced until the instruction executes successfully. 


As the Intel486 processor executes instructions, it follows a consistent cycle in checking for ex- 
ceptions. Consider the case of the Intel486 processor having just completed an instruction. It then 
performs the checks listed in Table 3-7 before reaching the point where the next instruction is 
completed. This cycle is repeated as each instruction is executed, and occurs in parallel with in- 
struction decoding and execution. Checking for EM, TS, or FPU error status only occurs for pro- 
cessors with on-chip Floating-Point Units. 
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Table 3-7. Sequence of Exception Checking 


Sequence Description 


1 Check for Exception 1 Traps from the instruction just completed (single-step via Trap Flag, or 
Data Breakpoints set in the Debug Registers). 


2 Check for Exception 1 Faults in the next instruction (Instruction Execution Breakpoint set in 
the Debug Registers for the next instruction). 


Check for external NMI and INTR. 


Check for Segmentation Faults that prevented fetching the entire next instruction (exceptions 
11 or 13). 


5 Check for Page Faults that prevented fetching the entire next instruction (exception 14). 


6 Check for Faults decoding the next instruction (exception 6 if illegal opcode; exception 6 if in 
Real Mode or in Virtual 8086 Mode and attempting to execute an instruction for Protected 
Mode only (see Section 6.5.4, “Protection and I/O Permission Bitmap”); or exception 13 if 
instruction is longer than 15 bytes, or privilege violation in Protected Mode (i.e., not at IOPL or 
at CPL=0). 


If WAIT opcode, check if TS=1 and MP=1 (exception 7 if both are 1). 
If opcode for Floating-Point Unit, check if EM=1 or TS=1 (exception 7 if either are 1). 


If opcode for Floating-Point Unit (FPU), check FPU error status (exception 16 if error status is 
asserted). 


10 Check in the following order for each memory reference required by the instruction: 


a. Check for Segmentation Faults that prevent transferring the entire memory quantity 
(exceptions 11, 12, 13). 


b. Check for Page Faults that prevent transferring the entire memory quantity (exception 14). 


NOTE: The order stated supports the concept of the paging mechanism being “underneath” the 
segmentation mechanism. Therefore, for any given code or data reference in memory, 
segmentation exceptions are generated before paging exceptions are generated. 


3.15.7 Instruction Restart 


The Intel486 processor fully supports restarting all instructions after faults. If an exception is de- 
tected in the instruction to be executed (exception categories 4 through 10 in Table 3-8), the 
Intel486 processor invokes the appropriate exception service routine. 


The Intel486 processor is in a state that permits restart of the instruction, for all cases except the 
following. An instruction causes a task switch to a task whose Task State Segment is partially 
“not present.” (An entirely “not present” TSS is restartable.) Partially present TSSs can be avoid- 
ed either by keeping the TSSs of such tasks present in memory, or by aligning TSS segments to 
reside entirely within a single 4 K page (for TSS segments of 4 Kbytes or less). 


NOTE 


Partially present task state segments can be easily avoided by proper design of 
the operating system. 
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3.15.8 Double Fault 


A Double Fault (exception 8) results when the Intel486 processor attempts to invoke an exception 
service routine for the segment exceptions (10, 11, 12 or 13), but in the process of doing so, de- 
tects an exception other than a Page Fault (exception 14). 


A Double Fault (exception 8) will also be generated when the Intel486 processor attempts to in- 
voke the Page Fault (exception 14) service routine, and detects an exception other than a second 
Page Fault. In any functional system, the entire Page Fault service routine must remain “present” 
in memory. 


When a Double Fault occurs, the Intel486 processor invokes the exception service routine for ex- 
ception 8. 


3.15.9 Floating-Point Interrupt Vectors 


Several interrupt vectors of the IntelDX2 and IntelDX4 processors are used to report exceptional 
conditions while executing numeric programs in either real or protected mode. Table 4-19 shows 
these interrupts and their causes. 


Table 3-8. Interrupt Vectors Used by FPU 


Interrupt Number Cause of Interrupt 


7 A Floating-Point instruction was encountered when EM or TS of the IntelDX2™ and 
IntelIDX4™ processor control register zero (CRO) was set. EM = 1 indicates that 
software emulation of the instruction is required. When TS is set, either a Floating- 
Point or WAIT instruction causes interrupt 7. This indicates that the current FPU 
context may not belong to the current task. 


13 The first word or doubleword of a numeric operand is not entirely within the limit of 
its segment. The return address pushed onto the stack of the exception handler 
points at the Floating-Point instruction that caused the exception, including any 
prefixes. The FPU has not executed this instruction; the instruction pointer and data 
pointer register refer to a previous, correctly executed instruction. 


16 The previous numerics instruction caused an unmasked exception. The address of 
the faulty instruction and the address of its operand are stored in the instruction 
pointer and data pointer registers. Only Floating-Point and WAIT instructions can 
cause this interrupt. The IntelDX2 and IntelDX4 processors return address pushed 
onto the stack of the exception handler points to a WAIT or Floating-Point 
instruction (including prefixes). This instruction can be restarted after clearing the 
exception condition in the FPU. The FNINIT, FNCLEX, FNSTSW, FNSTENV, and 
FNSAVE instructions can not cause this interrupt. 
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CHAPTER 4 
SYSTEM REGISTER ORGANIZATION 


4.1. REGISTER SET OVERVIEW 
The Intel486™ processor register set can be split into the following categories: 
¢ Base Architecture Registers 
— General Purpose Registers 
— Instruction Pointer 
— Flags Register 
— Segment Registers 
¢ System-Level Registers 
— Control Registers 
— System Address Registers 
¢ Debug and Test Registers 


The base architecture and floating-point registers (see below) are accessible by the applications 
program. The system-level registers can only be accessed at privilege level 0 and can only be used 
by system-level programs. The debug and test registers also can only be accessed at privilege 
level 0. 


4.2 FLOATING-POINT REGISTERS 


In addition to the registers listed above, the IntelIDX2™ and IntelDX4™ processors have the fol- 
lowing: 


¢ Floating-Point Registers 

¢ Data Registers 

¢ Tag Word 

¢ Status Word 

¢ Instruction and Data Pointers 


¢ Control Word 
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4.3. BASE ARCHITECTURE REGISTERS 


Figure 4-1 shows the Intel486 processor base architecture registers. The contents of these regis- 
ters are task-specific and are automatically loaded with a new context upon a task switch opera- 
tion. 


The base architecture includes six directly accessible descriptors, each specifying a segment up 
to 4 Gbytes in size. The descriptors are indicated by the selector values placed in the Intel486 pro- 
cessor segment registers. Various selector values can be loaded as a program executes. 


NOTE 
In register descriptions, “set” means “set to 1,” and “reset” means “set to 0.” 


4-2 | 


i ntel @ SYSTEM REGISTER ORGANIZATION 


General Purpose Registers 
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GS 
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31 1615 ) 


Flags Register 


FLAGS EFLAGS 
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Figure 4-1. Base Architecture Registers 
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4.3.1. General Purpose Registers 


Figure 4-1 on page 4-3 shows the eight 32-bit general purpose registers. These registers hold data 
or address quantities. The general purpose registers can support data operands of 1, 8, 16 and 32 
bits, and bit fields of 1 to 32 bits. Address operands of 16 and 32 bits are supported. The 32-bit 
registers are named EAX, EBX, ECX, EDX, ESI, EDI, EBP and ESP. 


The least significant 16 bits of the general purpose registers can be accessed separately using the 
16-bit names of the registers AX, BX, CX, DX, SI, DI, BP and SP. The upper 16 bits of the reg- 
ister are not changed when the lower 16 bits are accessed separately. 


Finally, 8-bit operations can individually access the lower byte (bits 7:0) and the highest byte (bits 
15:8) of the general purpose registers AX, BX, CX and DX. The lowest bytes are named AL, BL, 
CL and DL, respectively. The higher bytes are named AH, BH, CH and DH, respectively. The 
individual byte accessibility offers additional flexibility for data operations, but is not used for 
effective address calculation. 


4.3.2 Instruction Pointer 


The instruction pointer shown in Figure 4-1 is a 32-bit register named EJP. EJP holds the offset 
of the next instruction to be executed. The offset is always relative to the base of the code segment 
(CS). The lower 16 bits (bits 15:0) of the EIP contain the 16-bit instruction pointer named IP, 
which is used for 16-bit addressing. 


4.3.3 Flags Register 


The flags register is a 32-bit register named EFLAGS. The defined bits and bit fields within 
EFLAGS control certain operations and indicate the status of the Intel486 processor. The lower 
16 bits (bit 15:0) of EFLAGS contain the 16-bit register named FLAGS, which is most useful 
when executing 8086 and 80286 processor code. Figure 4-2 shows the EFLAGS register. 


EFLAGS bits 1, 3, 5, 15, and 22 to 31 are defined as “Intel Reserved.” When these bits are stored 
during interrupt processing or with a PUSHF instruction (push flags onto stack), a “1” is stored 
in bit 1 and zeros are stored in bits 3, 5, 15, and 22 to 31. 
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Flags 


31 30 29 28 27 26 25 24 23 2221 20 191817161514131211109 8 76543210 


VIV 
A|VIR NJ IOP }O}D] I]T{S]Z] FAT, IP], IC 
g 


Identification Fla 
Virtual Interrupt Pending 
Virtual Interrupt Flag 
Alignment Check 
Virtual Mode 
Resume Flag 
Nested Task Flag 
V/O Privilege Level 
Overflow 

Direction Flag 
Interrupt Enable 


EFLAGS 


Carry Flag 
Parity Flag 
Auxillary Flag 
Zero Flag 
Sign Flag 
Trap Flag 


[o] Indicates Intel Reserved; do not define. 


See Section 4.8, “Compatibility with Future Processors” 
A5145-01 


Figure 4-2. Flag Registers 


ID (dentification Flag, bit 21) 


The ability of a program to set and clear the ID flag indicates that the processor supports the 
CPUID instruction. (Refer to Chapter 12, “Instruction Set Summary” and Appendix D, 
“Feature Determination.’’) 


VIP (Virtual Interrupt Pending Flag, bit 20) 


The VIP flag together with the VIF enable each applications program in a multi-tasking 
environment to have virtualized versions of the system's IF flag. For more on the use of this 
flag in Virtual-8086 Mode and in Protected Mode, refer to Appendix C, “Advanced 
Features.” 


VIF (Virtual Interrupt Flag, bit 19) 


The VIF is a virtual image of the IF (interrupt flag) used with VIP. For more on the use of 
this flag in Virtual-86 Mode and in Protected Mode refer to Appendix C, “Advanced 
Features.” 
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AC (Alignment Check, bit 18) 


4-6 


The AC bit is defined in the upper 16 bits of the register. It enables the generation of faults 
when a memory reference is to a misaligned address. Alignment faults are enabled when 
AC is set to 1. A misaligned address is a word access to an odd address, a dword access to 
an address that is not on a dword boundary, or an 8-byte reference to an address that is not 
on a 64-bit word boundary. (See Section 10.1.5, "Operand Alignment") 


Alignment faults are only generated by programs running at privilege level 3. The AC bit 
setting is ignored at privilege levels 0, 1, and 2. Note that references to the descriptor tables 
(for selector loads), or the task state segment (TSS), are implicitly level 0 references even 
when the instructions causing the references are executed at level 3. Alignment faults are 
reported through interrupt 17, with an error code of 0. Table 4-1 gives the alignment 
required for the Intel486 processor data types. 


Table 4-1. Data Type Alignment Requirements 


Memory Access jes Bonncea 
Word 2 
Dword 4 
Single Precision Real 4 
Double Precision Real 8 
Extended Precision Real 8 
Selector 2 
48-bit Segmented Pointer 4 
32-bit Segmented Pointer 2 
48-bit “Pseudo-Descriptor” 4 
FSTENV/FLDENV Save Area 4/2 (On Operand Size) 
FSAVE/FRSTOR Save Area 4/2 (On Operand Size) 
Bit String 4 
NOTE 


Several instructions on the Intel486 processor generate misaligned references, 
even when their memory address is aligned. For example, on the Intel486 
processor, the SGDT/SIDT (store global/interrupt descriptor table) instruction 
reads/writes two bytes, and then reads/writes four bytes from a “pseudo- 
descriptor” at the given address. The Intel486 processor generates misaligned 
references unless the address is on a 2 mod 4 boundary. The FSAVE and 
FRSTOR instructions (floating-point save and restore state) generate 
misaligned references for one-half of the register save/restore cycles. The 
Intel486 processor does not cause any AC faults when the effective address 
given in the instruction has the proper alignment. 
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VM (Virtual 8086 Mode, bit 17) 


The VM bit provides Virtual 8086 Mode within Protected Mode. When the VM bit is set 
while the Intel486 processor is in Protected Mode, the Intel486 processor switches to 
Virtual 8086 operation, handling segment loads as the 8086 processor does, but generating 
exception 13 faults on privileged opcodes. The VM bit can be set only in Protected Mode 
by the IRET instruction (when current privilege level = 0) and by task switches at any 
privilege level. The VM bit is unaffected by POPF. PUSHF always pushes a 0 in this bit, 
even when executing in Virtual 8086 Mode. The EFLAGS image pushed during interrupt 
processing or saved during task switches contains a | in this bit if the interrupted code was 
executing as a Virtual 8086 Task. 


RF (Resume Flag, bit 16) 


The RF flag is used in conjunction with the debug register breakpoints. It is checked at 
instruction boundaries before breakpoint processing. When RF is set, it causes any debug 
fault to be ignored on the next instruction. RF is then automatically reset at the successful 
completion of every instruction (no faults are signaled) except the IRET instruction, the 
POPF instruction, (and JMP, CALL, and INT instructions causing a task switch). These 
instructions set RF to the value specified by the memory image. For example, at the end of 
the breakpoint service routine, the IRET instruction can pop an EFLAG image having the 
RF bit set and resume the program’s execution at the breakpoint address without generating 
another breakpoint fault on the same location. 


NT (Nested Task, bit 14) 


The flag applies to Protected Mode. NT is set to indicate that the execution of this task is 
within another task. When set, it indicates that the current nested task’s Task State Segment 
(TSS) has a valid back link to the previous task’s TSS. This bit is set or reset by control 
transfers to other tasks. The value of NT in EFLAGS is tested by the IRET instruction to 
determine whether to do an inter-task return or an intra-task return. A POPF or an IRET 
instruction affects the setting of this bit according to the image popped, at any privilege 
level. 


IOPL (Input/Output Privilege Level, bits 12-13) 


This two-bit field applies to Protected Mode. IOPL indicates the numerically maximum 
CPL (current privilege level) value permitted to execute I/O instructions without generating 
an exception 13 fault or consulting the I/O Permission Bitmap. It also indicates the 
maximum CPL value allowing alteration of the IF (NTR Enable Flag) bit when new values 
are popped into the EFLAG register. POPF and IRET instruction can alter the IOPL field 
when executed at CPL = 0. Task switches can always alter the IOPL field, when the new 
flag image is loaded from the incoming task’s TSS. 


OF (Overflow Flag, bit 11) 
The OF bit is set when the operation results in a signed overflow. Signed overflow occurs 
when the operation resulted in carry/borrow into the sign bit (high-order bit) of the result 
but did not result in a carry/borrow out of the high-order bit, or vice-versa. For 8-, 16-, 32- 
bit operations, OF is set according to overflow at bit 7, 15, and 31, respectively. 
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DF (Direction Flag, bit 10) 


DF defines whether ESI and/or EDI registers post decrement or post increment during the 
string instructions. Post increment occurs when DF is reset. Post decrement occurs when 
DF is set. 


IF (NTR Enable Flag, bit 9) 


The IF flag, when set, allows recognition of external interrupts signaled on the INTR pin. 
When IF is reset, external interrupts signaled on the INTR are not recognized. IOPL 
indicates the maximum CPL value allowing alteration of the IF bit when new values are 
popped into EFLAGS or FLAGS. 


TF (Trap Enable Flag, bit 8) 
TF controls the generation of the exception | trap when the processor is single-stepping 
through code. When TF is set, the Intel486 processor generates an exception | trap after the 
next instruction is executed. When TF is reset, exception | traps occur only as a function of 
the breakpoint addresses loaded into debug registers DR[3:0]. 

SF (Sign Flag, bit 7) 
SF is set if the high-order bit of the result is set; otherwise, it is reset. For 8-, 16-, 32-bit 
operations, SF reflects the state of bits 7, 15, and 31 respectively. 

ZF (Zero Flag, bit 6) 
ZF is set if all bits of the result are 0; otherwise, it is reset. 


AF (Auxiliary Carry Flag, bit 4) 
The Auxiliary Flag is used to simplify the addition and subtraction of packed BCD 
quantities. AF is set if the operation resulted in a carry out of bit 3 (addition) or a borrow 
into bit 3 (subtraction). Otherwise, AF is reset. AF is affected by carry out of, or borrow 
into bit 3 only, regardless of overall operand length: 8, 16 or 32 bits. 


PF (Parity Flags, bit 2) 


PF is set if the low-order eight bits of the operation contain an even number of “I's” (even 
parity). PF is reset if the low-order eight bits have odd parity. PF is a function of only the 
low-order eight bits, regardless of operand size. 


CF (Carry Flag, bit 0) 


CF is set if the operation resulted in a carry out of (addition), or a borrow into (subtraction) 
the high-order bit. Otherwise, CF is reset. For 8-, 16-, or 32-bit operations, CF is set 
according to carry/borrow at bit 7, 15, or 31, respectively. 
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4.3.4 Segment Registers 


Six 16-bit segment registers hold segment selector values identifying the currently addressable 
memory segments. In Protected Mode, each segment may range in size from one byte up to the 
entire linear and physical address space of the machine, 4 Gbytes (0° bytes). In Real Mode, the 
maximum segment size is fixed at 64 Kbytes aie bytes). 


The six addressable segments are defined by the segment registers CS, SS, DS, ES, FS and GS. 
The selector in CS indicates the current code segment; the selector in SS indicates the current 
stack segment; the selectors in DS, ES, FS, and GS indicate the current data segments. 


4.3.5 Segment Descriptor Cache Registers 


The segment descriptor cache registers are not programmer-visible, but it is useful to understand 
their content. A programmer-invisible descriptor cache register is associated with each program- 
mer-visible segment register, as shown in Figure 4-3. Each descriptor cache register holds a 32- 
bit base address, a 32-bit segment limit, and the other necessary segment attributes. 


Segment Registers Descriptor Registers (Loaded Automatically) 


a 


15 0 


Other Segment Attributes 
Physical Base Address Segment Limit from Descriptor 


A5147-01 


Figure 4-3. Intel486™ Processor Segment Registers and Associated Descriptor Cache 
Registers 


When a selector value is loaded into a segment register, the associated descriptor cache register 
is automatically updated with the correct information. In Real Mode, only the base address is up- 
dated directly (by shifting the selector value four bits to the left), because the segment maximum 
limit and attributes are fixed in Real Mode. In Protected Mode, the base address, the limit, and 
the attributes are all updated with the contents of the segment descriptor indexed by the selector. 
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When a memory reference occurs, the segment descriptor cache register associated with the seg- 
ment being used is automatically involved with the memory reference. The 32-bit segment base 
address becomes a component of the linear address calculation, the 32-bit limit is used for the 
limit-check operation, and the attributes are checked against the type of memory reference re- 
quested. 


4.4 SYSTEM-LEVEL REGISTERS 


Figure 4-4 illustrates the system-level registers, which are the control operation of the on-chip 
cache, the on-chip floating-point unit (on the IntelDX2 and IntelDX4 processors) and the seg- 
mentation and paging mechanisms. These registers are only accessible to programs running at 
privilege level 0, the highest privilege level. 


The system-level registers include three control registers and four segmentation base registers. 
The three control registers are CRO, CR2 and CR3. CRI is reserved for future Intel processors. 
The four segmentation base registers are the Global Descriptor Table Register (GDTR), the In- 
terrupt Descriptor Table Register (IDTR), the Local Descriptor Table Register (LDTR) and the 
Task State Segment Register (TR). 


31 24/23 16/15 8]7 0 


47 32-Bit Linear Base Address 1615 Limit 0 


IDTR 


System 
Segment Registers Descriptor Registers (Loaded Automatically) 


[oe th eae, eee aR ~ Ra 


15 0 32-Bit Linear Base Address 20-Bit Segment Limit Attributes 


TR 


LDTR 
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Figure 4-4. System-Level Registers 
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24] 23 16]15 8 


31 7 0 
PICIN A| |W Ny, JTyE P 
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See 


MSW 


CRO 


[] Indicates Intel Reserved; do not define. 


Note: 
See section 4.2.7 "Compatibility." 
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Figure 4-5. Control Register 0 


4.4.1. Control Registers 
Control Register 0 (CRO) 


CRO, shown in Figure 4-5, contains 10 bits for control and status purposes. The function of the 
bits in CRO can be categorized as follows: 


¢ Intel486 Processor Operating Modes: PG, PE (Table 4-2) 
¢ On-Chip Cache Control Modes: CD, NW (Table 4-3) 


¢ On-Chip Floating-Point Unit: NE, TS, EM, TS (Tables 4-4, 4-5, and 4-6). (Also applies for 
the Intel486 SX processor.) 


¢ Alignment Check Control: AM 


¢ Supervisor Write Protect: WP 


Table 4-2. Intel486™ Processor Operating Modes 


0 Real Mode. Exact 8086 processor semantics, with 32-bit extensions available with prefixes. 


1 Protected Mode. Exact 80286 processor semantics, plus 32-bit extensions through both 
prefixes and “default” prefix setting associated with code segment descriptors. Also, a sub- 
mode is defined to support a virtual 8086 processor within the context of the extended 
80286 processor protection model. 


1 0 Undefined. Loading CRO with this combination of PG and PE bits causes a GP fault with 
error code 0. 


1 1 Paged Protected Mode. All the facilities of Protected Mode, with paging enabled underneath 
segmentation. 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


Table 4-3. On-Chip Cache Control Modes 


cD NW Operating Mode 
1 1 Cache fills disabled, write-through and invalidates disabled. 
1 0 Cache fills disabled, write-through and invalidates enabled. 
0 1 INVALID. If CRO is loaded with this configuration of bits, a GP fault with error code results. 
0 0 Cache fills enabled, write-through and invalidates enabled. 


The low-order 16 bits of CRO are also known as the Machine Status Word (MSW), for compati- 
bility with the 80286 processor Protected Mode. LMSW and SMSW (load and store MSW) in- 
structions are taken as special aliases of the load and store CRO operations, where only the low- 
order 16 bits of CRO are involved. The LMSW and SMSW instructions in the Intel486 processor 
work in an identical fashion to the LMSW and SMSW instructions in the 80286 processor (i.e., 
they operate only on the low-order 16 bits of CRO and ignore the new bits). New Intel486 pro- 
cessor operating systems should use the MOV CRO, Reg instruction. 


NOTE 


All Intel386 and Intel486 processor CRO bits, except for ET and NE, are 
upwardly compatible with the 80286 processor, because they are in register 
bits not defined in the 80286 processor. For strict compatibility with the 80286 
processor, the load machine status word (LMSW) instruction is defined to not 
change the ET or NE bits. 


The defined CRO bits are described as follows. 


The PG (Paging Enable, bit 31) 
PG bit is used to indicate whether paging is enabled (PG=1) or disabled (PG=0). (See 
Table 4-2.) 

CD (Cache Disable, bit 30) 


The CD bit is used to enable the on-chip cache. When CD=1, the cache is not filled on 
cache misses. When CD=0, cache fills may be performed on misses. (See Table 4-3.) 

The state of the CD bit, the cache enable input pin (KEN#), and the relevant page cache 
disable (PCD) bit determine whether a line read in response to a cache miss will be installed 
in the cache. A line is installed in the cache only when CD=0 and KEN# and PCD are both 
zero. The relevant PCD bit comes from either the page table entry, page directory entry or 
control register 3. (Refer to Section 6.4.4, "Page Cacheability (PWT and PCD Bits)") 


CD is set to “1” after RESET. 
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NW (Not Write-Through, bit 29) 
The NW bit enables on-chip cache write-throughs and write-invalidate cycles (NW=0). 
When NW=0, all writes, including cache hits, are sent out to the pins. Invalidate cycles are 
enabled when NW=0. During an invalidate cycle, a line is removed from the cache if the 
invalidate address hits in the cache. (See Table 4-3.) 


When NW=1, write-throughs and write-invalidate cycles are disabled. A write is not sent to 
the pins if the write hits in the cache. With NW=1 the only write cycles that reach the 
external bus are cache misses. Write hits with NW=1 never update main memory. 
Invalidate cycles are ignored when NW=1. 


AM (Alignment Mask, bit 18) 
The AM bit controls whether the alignment check (AC) bit in the flag register (EFLAGS) 
can allow an alignment fault. AM=0 disables the AC bit. AM=1 enables the AC bit. AM=0 
is the Intel386 processor compatible mode. 
Intel386 processor software may load incorrect data into the AC bit in the EFLAGS 
register. Setting AM=0 prevents AC faults from occurring before the Intel486 processor has 
created the AC interrupt service routine. 


WP (Write Protect, bit 16) 
WP protects read-only pages from supervisor write access. The Intel386 processor allows a 
read-only page to be written from privilege levels 0 to 2. The Intel486 processor are 
compatible with the Intel386 processor when WP = 0. WP = 1 forces a fault on a write to a 
read-only page from any privilege level. Operating systems with Copy-on-Write features 
can be supported with the WP bit. (Refer to Section 6.4.3, "Page Level Protection (R/W, 
USS Bits)") 


NOTE 


Refer to Tables 4-4, 4-5, and 4-6 for values and interpolation of NE, EM, TS, 
and MP bits, in addition to the sections below. 
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NE (Numerics Exception, bit 5) 


Intel486 SX Processor NE bit: 


For the Intel486 SX processor, interrupt 7 is generated upon encountering any floating- 
point instruction regardless of the value of the NE bit. It is recommended that NE=1 for 
normal operation of the processor. 


IntelDX2 and IntelDX4 Processor NE bit: 


For IntelDX2 and IntelDX4 processors, the NE bit controls whether unmasked floating- 
point exceptions (UFPE) are handled through interrupt vector 16 (NE=1) or through an 
external interrupt (NE=0). NE=0 (default at reset) supports the DOS operating system error 
reporting scheme from the 8087, Intel287 and Intel387 math coprocessors. In DOS systems, 
math coprocessor errors are reported via external interrupt vector 13. DOS uses interrupt 
vector 16 for an operating system call. (Refer to Section 9.2.15, "Numeric Error Reporting 
(FERR#, IGNNE#)" and Section 10.3.14, "Floating-Point Error Handling for the 
IntelIDX2™ and IntelDX4™ Processors") 


For any UFPE, the floating-point error output pin (FERR#) is driven active. 


For NE=0, the IntelDX2 and IntelDX4 processors work in conjunction with the ignore 
numeric error input (IGNNE#) and the FERR# output pins. When a UFPE occurs and the 
IGNNE# input is inactive, the IntelDX2 and IntelDX4 processors freeze immediately 
before executing the next floating-point instruction. An external interrupt controller 
supplies an interrupt vector when FERR# is driven active. The UFPE is ignored if IGNNE# 
is active and floating-point execution continues. 


NOTE 
The freeze does not take place when the next instruction is one of the control 
instructions FNCLEX, FNINIT, FNSAVE, FNSTENV, FNSTCW, FNSTSW, 
FNSTSW AX, FNENI, FNDISI and FNSETPM. The freeze does occur when 
the next instruction is WAIT. 


For NE=1, any UFPE results in a software interrupt 16, immediately before 
executing the next non-control floating-point or WAIT instruction. The ignore 
numeric error input IGNNE#) signal is ignored. 


TS (Task Switch, bit 3) 


Intel486 SX Processor TS bit: 


For the Intel486 SX processor, the TS bit is set whenever a task switch operation is 
performed. Execution of floating-point instructions with TS=1 causes a Device Not 
Available (DNA) fault (trap vector 7). With MP=0, the value of TS bit is a don’t care for the 
WAIT instructions; i.e., these instructions do not generate trap 7. 


IntelIDX2 and IntelDX4 Processor TS bit: 


For IntelDX2 and IntelDX4 processors, the TS bit is set whenever a task switch operation is 
performed. Execution of floating-point instructions with TS=1 causes a Device Not 
Available (DNA) fault (trap vector 7). If TS=1 and MP=1 (monitor coprocessor in CRO), a 
WAIT instruction causes a DNA fault. 
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EM (Emulate Coprocessor, bit 2) 


¢ Intel486 SX Processor EM bit: 


For the Intel486 SX processor, the EM bit should be set to one. This causes the Intel486 SX 
processor to trap via interrupt vector 7 (Device Not Available) to a software exception 
handler whenever it encounters a floating-point instruction. If EM bit is 0 for the 

Intel486 SX processors, the system hangs. (See Tables 4-4 and 4-5.) 


¢ IntelDX2 and IntelDX4 Processor EM bit: 


For the IntelIDX2, and IntelDX4 processors, the EM bit determines whether floating-point 
instructions are trapped (EM=1) or executed. If EM=1, all floating-point instructions cause 
fault 7. 


If EM=0, the on-chip floating-point is used. 


NOTE 
WAIT instructions are not affected by the state of EM. 
(See Tables 4-4 and 4-6.) 


MP (Monitor Coprocessor, bit 1) 


¢ Intel486 SX Processor MP bit: 
For the Intel486 SX processor, the MP bit must be set to zero (MP=0). The MP bit is used in 
conjunction with the TS bit to determine whether WAIT instructions should trap. For 


MP=0, the value of TS is a don’ care for these type of instructions. (See Tables 4-4 and 
4-5.) 


¢ IntelDX2 and IntelDX4 Processor MP bit: 


For the IntelDX2 and IntelDX4 processors, the MP is used in conjunction with the TS bit to 
determine whether WAIT instructions cause fault 7. (See Table 4-6.) The TS bit is set to 1 
on task switches by the IntelDX2 and IntelDX4 processors. Floating-point instructions are 
not affected by the state of the MP bit. It is recommended that the MP bit be set to one for 
normal processor operation. 


PE (Protection Enable, bit 0) 


The PE bit enables the segment based protection mechanism when PE=1 protection is 
enabled. When PE=0 the Intel486 processor operates in Real Mode, with segment based 
protection disabled and addresses formed as in an 8086 processor. (Refer to Table 4-2.) 
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Table 4-4. Recommended Values of NE, EM, TS, and MP Bits in CRO Register for the 
Intel486™ SX Processor 


CRO Bit Instruction Type 
NE EM TS MP FP WAIT 
1 1 0 Trap7 Execute 
1 1 1 Trap7 Execute 


Table 4-5. Recommended Values of the Floating-Point Related Bits for 


Intel486™ Processors 


CRO Bit 


Intel486™ SX Processor 


IntelDX2™, and 
IntelDX4™ Processors 


EM 


MP 


NE 


0 for DOS Systems 


1 for User-Defined Exception Handler 


Table 4-6. Interpretation of Different Combinations of the EM, TS and MP Bits for All 


Intel486™ Processors 


CRO Bit Instruction Type 
EM TS MP Floating-Point Wait 
0 0 0 Execute Execute 
0 0 Execute Execute 
0 1 0 Exception 7 Execute 
0 1 1 Exception 7 Exception 7 
1 0 0 Exception 7 Execute 
1 0 1 Exception 7 Execute 
1 1 0 Exception 7 Execute 
1 1 1 Exception 7 Exception 7 
NOTE: For IntelDX2™ and IntelDX4™ processors, when MP=1 and TS=1, the processor generates a trap 7 
so that the system software can save the floating-point status of the old task. 
4-16 


intel. SYSTEM REGISTER ORGANIZATION 


Page Fault Linear Address Register CR2 


Page Directory Base Register 


Intel Reserved 


Lo | indicates Intel reserved; Do not define. 
NOTE: See Section 4.8, “Compatibility with Future Processors.” 


Figure 4-6. Control Registers 2, 3, and 4 


Control Register | (CR1) 
CR1 is reserved for use in future Intel processors. 


Control Register 2 (CR2) 
CR2, shown in Figure 4-6, holds the 32-bit linear address that caused the last page fault 
detected. The error code pushed onto the page fault handler’s stack when it is invoked 
provides additional status information on this page fault. 
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Control Register 3 (CR3) 


CR3, shown in Figure 4-6, contains the physical base address of the page directory table. 
The page directory is always page aligned (4 Kbyte-aligned). This alignment is enforced by 
only storing bits 12-31 in CR3. 


In the Intel486 processor, CR3 contains two bits, page write-through (PWT) (bit 3) and 
page cache disable (PCD) (bit 4). The page table entry (PTE) and page directory entry 
(PDE) also contain PWT and PCD bits. PWT and PCD control page cacheability. When a 
page is accessed in external memory, the states of PWT and PCD are driven out on the 
PWT and PCD pins. The source of PWT and PCD can be CR3, the PTE or the PDE. PWT 
and PCD are sourced from CR3 when the PDE is being updated. When paging is disabled 
(PG = 0 in CRO), PCD and PWT are assumed to be 0, regardless of their state in CR3. 


A task switch through a task state segment (TSS) which changes the values in CR3, or an 
explicit load into CR3 with any value, invalidates all cached page table entries in the 
translation lookaside buffer (TLB). 

The page directory base address in CR3 is a physical address. The page directory can be 
paged out while its associated task is suspended, but the operating system must ensure that 
the page directory is resident in physical memory before the task is dispatched. The entry in 
the TSS for CR3 has a physical address, with no provision for a present bit. This means that 
the page directory for a task must be resident in physical memory. The CR3 image in a TSS 
must point to this area, before the task can be dispatched through its TSS. 


Control Register 4 (CR4) 
CR4, shown in Figure 4-6, contains bits that enable Virtual-86 Mode extensions and 
Protected Mode virtual interrupts. 

VME (Virtual-8086 Mode Extensions, bit 0 of CR4) 


Setting this bit to 1 enables support for a virtual interrupt flag in Virtual-86 Mode. This 
feature can improve the performance of Virtual-86 Mode applications by eliminating the 
overhead of faulting to a Virtual-86 Mode monitor for emulation of certain operations. 
(Refer to Appendix A, “Advanced Features.’’) 


PVI (Protected-Mode Virtual Interrupts, bit 1 of CR4) 
Setting this bit to 1 enables support for a virtual interrupt flag in Protected Mode. This 
feature can enable some programs designed for execution at privilege level 0 to execute at 
privilege level 3. (Refer to Appendix A, “Advanced Features.”’) 

PSE (Page Size Extensions, bit 4 of CR4) 


Setting this bit to 1 enables 4-Mbyte pages. (Refer to Appendix A, “Advanced Features.”) 


NOTE 


Features described in CR4 (VME, PVI, and PSE) in the CPUID Feature Flag 
should be qualified with the CPUID instruction. The CPUID instruction and 
CPUID Feature Flag are specific to particular models in the Intel486 processor 
family. (Refer to Appendix B, “Feature Determination.’’) 
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4.4.2 System Address Registers 


Four special registers are defined to reference the tables or segments supported by the 80286, 
Inte1386, and Intel486 processors’ protection model. These tables or segments are: GDT (Global 
Descriptor Table), IDT (Interrupt Descriptor Table), LDT (Local Descriptor Table), TSS (Task 
State Segment). 


The addresses of these tables and segments are stored in special registers: the System Address 
and System Segment Registers, illustrated in Figure 4-4. These registers are named GDTR, 
IDTR, LDTR, and TR respectively. Chapter 6, “Protected Mode Architecture” describes how to 
use these registers. 


System Address Registers: GDTR and IDTR 


The GDTR and IDTR hold the 32-bit linear-base address and 16-bit limit of the GDT and 
IDT, respectively. 


Because the GDT and IDT segments are global to all tasks in the system, the GDT and IDT 
are defined by 32-bit linear addresses (subject to page translation when paging is enabled) 
and 16-bit limit values. 

System Segment Registers: LDTR and TR 


The LDTR and TR hold the 16-bit selector for the LDT descriptor and the TSS descriptor, 
respectively. 


Because the LDT and TSS segments are task-specific segments, the LDT and TSS are 
defined by selector values stored in the system segment registers. 


NOTE 


A programmer-invisible segment descriptor register is associated with each 
system segment register. 
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4.5 FLOATING-POINT REGISTERS 


Figure 4-7 shows the floating-point register set. The on-chip FPU contains eight data registers, a 
tag word, a control register, a status register, an instruction pointer and a data pointer. 


The operation of the IntelDX2 and IntelDX4 processor on-chip floating-point unit is exactly the 
same as the Intel387 math coprocessor. Software written for the Intel387 math coprocessor runs 
on the on-chip floating-point unit (FPU) without modifications. 


4.5.1 Floating-Point Data Registers 


Floating-point computations use the IntelD X2 and IntelDX4 processor FPU data registers. These 
eight 80-bit registers provide the equivalent capacity of twenty 32-bit registers. Each of the eight 
data registers is divided into “fields” corresponding to the FPU’s extended-precision data type. 


Tag Field 
6463 0 1 0 


8 
Significand 


47 0 
Instruction Pointer 
Data Pointer 


A5150-01 


Figure 4-7. Floating-Point Registers 
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The FPU’s register set can be accessed either as a stack, with instructions operating on the top 
one or two stack elements, or as a fixed register set, with instructions operating on explicitly des- 
ignated registers. The TOP field in the status word identifies the current top-of-stack register. A 
“push” operation decrements TOP by one and loads a value into the new top register. A “pop” 
operation stores the value from the current top register and then increments TOP by one. Like 
other IntelDX2 and IntelDX4 processor stacks in memory, the FPU register stack grows “down” 
toward lower-addressed registers. 


Instructions may address the data registers either implicitly or explicitly. Many instructions op- 
erate on the register at the TOP of the stack. These instructions implicitly address the register at 
which TOP points. Other instructions allow the programmer to explicitly specify which register 
to use. This explicit register addressing is also relative to TOP. 


4.5.2 Floating-Point Tag Word 


The tag word marks the content of each numeric data register, as shown in Figure 4-8. Each two- 
bit tag represents one of the eight data registers. The principal function of the tag word is to op- 
timize the FPU’s performance and stack handling by making it possible to distinguish between 
empty and non-empty register locations. It also enables exception handlers to check the contents 
of a stack location without the need to perform complex decoding of the actual data. 


15 0 
Note: 


The index i of tag(i) is not top-relative. A program typically uses the "top" field of Status Word to 
determine which tag(i) field refers to logical top of stack. 


Tag Values: 
00 = Valid 
01 = Zero 
10 = QNaN, SNaN, Infinity, Denormal, and Unsupported Formats 
11 = Empty 


A5151-01 


Figure 4-8. Floating-Point Tag Word 


4.5.3 Floating-Point Status Word 


The 16-bit status word reflects the overall state of the FPU. The status word is shown in Figure 
4-9 and is located in the status register. 
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Busy 


Top of Stack Pointer 
Condition Code 


Error Summary Status 
Stack Flag 


Exception Flags: 

Precision 

Underflow 

Overflow 

Zero Divide 
Denormalized Operand 
Invalid Operation 


ES is set if any unmasked exception bit is set; cleared otherwise. 
See Table 4-7 for interpretation of condition code. 
Top Values: 


000 = Register 0 is Top of Stack 
001 = Register 1 is Top of Stack 


* 


* 


111 = Register 7 is Top of Stack 


For definitions of exceptions, refer to the section entitled, "Exception Handling". 


Note: 

The B-bit (Busy, bit 15) is included for 8087 compatibility. The B-bit reflects the contents of the ES 
bit (bit 7 of the status word). 

Bits 13-11 (TOP) point to the FPU register that is the current top-of-stack. 

The four numeric condition code bits, CO-C3, are similar to the flags in EFLAGS. Instructions that 
perform arithmetic operations update C0-C3 to reflect the outcome. The effects of these 
instructions on the condition codes are summarized in Table 4-7 through Table 4-10. 


A5152-01 


Figure 4-9. Floating-Point Status Word 
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Table 4-7. Floating-Point Condition Code Interpretation 


Instruction Co (S) C3 (2) C1 (A) C2 (C) 
FPREM, FPREM1 Three least significant bits of quotient (See Table 4-8.) Reduction 
0 = complete 


Q2 Qo 


Q1 or O/U# 


1 = incomplete 


FCOM, FCOMP, FCOMPP, 
FTST, FUCOM, FUCOMP, 
FUCOMPP, FICOM, 
FICOMP 


Result of comparison (see Table 4-9) 


Zero or O/U# 


Operand is not 
comparable 


FXAM 


Operand class (see Table 4-10) 


Sign or O/U# 


Operand class 


FCHS, FABS, FXCH, 
FINCTOP, FDECTOP, 
Constant loads, FXTRACT, 
FLD, FILD, FBLD, FSTP 
(ext real) 


FIST, FBSTP, FRNDINT, 
FST, FSTP, FADD, FMUL, 
FDIV, FDIVR, FSUB, 
FSUBR, FSCALE, FSQRT, 
FPATAN, F2XM1, FYL2X, 
FYL2XP1 


FPTAN, FSIN, FCOS, 
FSINCOS 


UNDEFINED 


UNDEFINED 


UNDEFINED 


Zero or O/U# 


Roundup or O/U# 


Roundup or 
O/U#, if C2 = 1 


UNDEFINED 


UNDEFINED 


Reduction 
0 = complete 
1 = incomplete 


FLDENV, FRSTOR 
FINIT 


Each bit loaded from memory 
Clears these bits 


FLDCW, FSTENV, FSTCW, 
FSTSW, FCLEX, FSAVE 


UNDEFINED 


NOTES: 


1. When both IE and SF bits of status word are set, indicating a stack exception, this bit distinguishes 
between stack overflow (C1 = 1) and underflow (C1 = 0). 

2. Reduction: If FRREM or FPREM1 produces a remainder that is less than the modulus, reduction is 
complete. When reduction is incomplete, the value at the top of the stack is a partial remainder, which 
can be used as input to further reduction. For FPTAN, FSIN, FCOS, and FSINCOS, the reduction bit is 
set if the operand at the top of the stack is too large. In this case, the original operand remains at the top 


of the stack. 


3. Roundup: When the PE bit of the status word is set, this bit indicates whether the last rounding in the 


instruction was upward. 


4. UNDEFINED: Do not rely on finding any specific value in these bits. See Section 4.8, Compatibility with 
Future Processors (pg. 4-34). 
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Table 4-8. Condition Code Interpretation after FPREM and FPREM1 Instructions 


Condition Code 


Interpretation after FRREM and FPREM1 


C2 C3 C1 co 
1 X X X Incomplete Reduction: 
further interaction required for complete reduction 
Qi QO Q2 Q MOD8 Complete Reduction: 
0 0 CO, C3, and C1 contain the three 
0 1 least-significant bits of the quotient 
0 2 
0 3 
1 4 
1 5 
1 6 
1 1 1 7 


Table 4-9. Condition Code Resulting from Comparison 


Order C3 C2 co 
TOP > Operand 0 0 
TOP < Operand 0 1 
TOP = Operand 1 0 0 

Unordered 1 1 1 
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Table 4-10. Condition Code Defining Operand Class 


9 
7) 
ie) 
np 
?) 
— 
i?) 
ra) 


Value at TOP 


+ Unsupported 
+ NaN 
- Unsupported 
- NaN 


+ Normal 


+ Infinity 


- Normal 


=—=!-3|;[o0/|o 


- Infinity 

0 +0 

1 + Empty 
-0 


- Empty 


s+/ +] oO;}/oO;oO;o0; =| =~] +=} +), CO] CO} CO] CO 


=![1o/-| = | o | o 


0 
1 
0 + Denormal 
0 


- Denormal 


Bit 7 is the error summary (ES) status bit. The ES bit is set if any unmasked exception bit (bits 
5:0 in the status word) is set; ES is clear otherwise. The FERR# (floating-point error) signal is 
asserted when ES is set. 


Bit 6 is the stack flag (SF). This bit is used to distinguish invalid operations due to stack overflow 
or underflow. When SF is set, bit 9 (C1) distinguishes between stack overflow (C1=1) and under- 
flow (C1=0). 


Table 4-11 shows the six exception flags in bits 5:0 of the status word. Bits 5:0 are set to indicate 
that the FPU has detected an exception while executing an instruction. 


The six exception flags in the status word can be individually masked by mask bits in the FPU 
control word. Table 4-11 lists the exception conditions, and their causes in order of precedence. 
Table 4-11 also shows the action taken by the FPU if the corresponding exception flag is masked. 


An exception that is not masked by the control word causes three things to happen: the corre- 
sponding exception flag in the status word is set, the ES bit in the status word is set, and the 
FERR# output signal is asserted. When the IntelDX2 or IntelDX4 processor attempts to execute 
another floating-point or WAIT instruction, exception 16 occurs or an external interrupt happens 
if the NE=1 in control register 0. The exception condition must be resolved via an interrupt ser- 
vice routine. The FPU saves the address of the floating-point instruction that caused the exception 
and the address of any memory operand required by that instruction in the instruction and data 
pointers. (See Section 4.5.4, “Instruction and Data Pointers.’’) 
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Note that when a new value is loaded into the status word by the FLDENV (load environment) 
or FRSTOR (restore state) instruction, the value of ES (bit 7) and its reflection in the B bit (bit 
15) are not derived from the values loaded from memory. The values of ES and B are dependent 
upon the values of the exception flags in the status word and their corresponding masks in the 
control word. If ES is set in such a case, the FERR# output of the IntelD X2 or IntelDX4 processor 
is activated immediately. 


Table 4-11. FPU Exceptions 


Default Action (if 


Exception Cause exception is masked) 
Invalid Operation on a signaling NaN, unsupported format, Result is a quiet NaN, 
Operation indeterminate form (0*co, 0/0, (+00) + (-cc), etc.), or stack integer indefinite, or BCD 

overflow/underflow (SF is also set). indefinite 
Denormalized | At least one of the operands is denormalized; i.e., it has the Normal processing 
Operand smallest exponent but a non-zero significand. continues 
Zero Divisor The divisor is zero while the dividend is a non-infinite, non-zero | Result is © 
number. 
Overflow The result is too large in magnitude to fit in the specified format. | Result is largest finite 
value or oo 
Underflow The true result is non-zero but too small to be represented in the | Result is denormalized or 
specified format, and, when underflow exception is masked, zero 
denormalization causes loss of accuracy. 
Inexact Result | The true result is not exactly representable in the specified Normal processing 
(Precision) format (e.g., 1/3); the result is rounded according to the continues 


rounding mode. 


4.5.4 Instruction and Data Pointers 


Because the FPU operates in parallel with the ALU (in the IntelDX2 and IntelDX4 processors the 
arithmetic and logic unit (ALU) consists of the base architecture registers), any errors detected 
by the FPU may be reported after the ALU has executed the floating-point instruction that caused 
it. To allow identification of the failing numeric instruction, the IntelDX2 and IntelDX4 proces- 
sors contain two pointer registers that supply the address of the failing numeric instruction and 
the address of its numeric memory operand (if appropriate). 


The instruction and data pointers are provided for user-written error handlers. These registers are 
accessed by the FLDENV (load environment), FSTENV (store environment), FSAVE (save 
state) and FRSTOR (restore state) instructions. Whenever the IntelDX2 and IntelDX4 processors 
decode a new floating-point instruction, it saves the instruction (including any prefixes that may 
be present), the address of the operand (if present) and the opcode. 
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The instruction and data pointers appear in one of four formats depending on the operating mode 
of the IntelDX2 and IntelDX4 processors (Protected Mode or Real Mode) and depending on the 
operand-size attribute in effect (32-bit operand or 16-bit operand). When the IntelDX2 or 
IntelIDX4 processor is in the Virtual-86 Mode, the Real Mode formats are used. Figures 4-10 
through Figure 4-13 show the four formats. The floating-point instructions FLDENV, FSTENV, 
FSAVE and FRSTOR are used to transfer these values to and from memory. Note that the value 
of the data pointer is undefined if the prior floating-point instruction did not have a memory op- 
erand. 


NOTE 
The operand size attribute is the D bit in a segment descriptor. 


32-Bit Protected Mode Format 


31 23 15 7. 0 

ee 
ee 
ee 
ee 
oe 


A5153-01 


Figure 4-10. Protected Mode FPU Instructions and Data Pointer Image in Memory— 
32-Bit Format 
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32-Bit Protected Mode Format 


31 23 15 7 0 

es 
ee 
ee 


Intel Reserved Instruction Pointer 15..0 c 


0000 Instruction Pointer 31..16 | OPCODE 10..0 10 


Intel Reserved Operand Pointer 15..0 14 


0000 Operand 31..16 0000 00000000 18 


A5154-01 


Figure 4-11. Real Mode FPU Instruction and Data Pointer Image in Memory— 
32-Bit Format 


16-Bit Protected Mode Format 


5 7 0 
[ened 
[sted 
esd: 
[ses 
[eed 
[aertoos 
ee, 


A5155-01 


4-28 


Figure 4-12. Protected Mode FPU Instruction and Data Pointer Image in Memory— 
16-Bit Format 
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16-Bit Real Address Mode and 
Virtual-8086 Mode Format 


A5156-01 


Figure 4-13. Real Mode FPU Instruction and Data Pointer Image in Memory— 
16-Bit Format 
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Reserved 
Reserved? 
Rounding Control 
Precision Control 


Reserved +19" after reset or FINIT: 


changeable upon loading the 
control word (CW). Programs 
must ignore this bit. 


Exception Masks: 
Precision 

Underflow 

Overflow 

Zero Divide 
Denormalized Operand 
Invalid Operation 


Precision Control: Rounding Control: 

00-24 bits (single precision) 00-Round to nearest or even 
01- (reserved) 01- Round down (toward -ce) 
10-53 bits (double precision) 10- Round up (toward +oe) 
11-64 bits (extended precision) 11- Chop (truncate toward zero) 


Note: 
See section 4.2.7, "Compatibility," for RESERVED bits. 


A5157-01 


Figure 4-14. FPU Control Word 


4.5.5 FPU Control Word 


The FPU provides several processing options that are selected by loading a control word from 
memory into the control register. Figure 4-14 shows the format and encoding of fields in the con- 
trol word. 


The low-order byte of the FPU control word configures the FPU error and exception masking. 
Bits 5:0 of the control word contain individual masks for each of the six exceptions that the FPU 
recognizes. 


The high-order byte of the control word configures the FPU operating mode, including precision 
and rounding. 
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RC (Rounding Control, bits 11:10) 


RC bits provide for directed rounding and true chop, as well as the unbiased round to 
nearest even mode specified in the IEEE standard. Rounding control affects only those 
instructions that perform rounding at the end of the operation (and thus can generate a 
precision exception); namely, FST, FSTP, FIST, all arithmetic instructions (except 
FPREM, FPREM1, FXTRACT, FABS and FCHS), and all transcendental instructions. 


PC (Precision Control, bits 9:8) 


PC bits can be used to set the FPU internal operating precision of the significand at less than 
the default of 64 bits (extended precision). This can be useful in providing compatibility 
with early generation arithmetic processors of smaller precision. PC affects only the 
instructions ADD, SUB, DIV, MUL, and SQRT. For all other instructions, either the 
precision is determined by the opcode or extended precision is used. 


4.6 DEBUG AND TEST REGISTERS 


4.6.1 Debug Registers 


The six programmer accessible debug registers (Figure 4-15) provide on-chip support for debug- 
ging. Debug registers DR[3:0] specify the four linear breakpoints. The Debug control register 
DR7, is used to set the breakpoints and the Debug Status Register, DR6, displays the current state 
of the breakpoints. The use of the Debug registers is described in Chapter 11, “Debugging Sup- 
port.” 
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Debug Registers 


Linear Breakpoint Address 0 DRO 
Linear Breakpoint Address 1 DR1 
Linear Breakpoint Address 2 DR2 
Linear Breakpoint Address 3 DR3 
Intel Reserved, Do Not Define DR4 
Intel Reserved, Do Not Define DR5 
Breakpoint Status DR6 
Breakpoint Control DR7 


Test Registers 
Cache Test Data TR3 
Cache Test Status TR4 
Cache Test Control TRS 
TLB Test Control TR6 
TLB Test Status TR7 
TLB - Translation Lookaside Buffer 242202-025 


Figure 4-15. Debug and Test Registers 


4.6.2 Test Registers 


The Intel486 processor contains five test registers. Figure 4-15 show the test registers. TR6 and 
TR7 are used to control the testing of the translation lookaside buffer. TR3, TR4 and TRS are 
used for testing the on-chip cache. The use of the test registers is discussed in Appendix B, “Test- 
ability.” 
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SYSTEM REGISTER ORGANIZATION 


There are a few differences regarding the accessibility of the registers in Real and Protected 
Mode. Table 4-12 summarizes these differences. (See Chapter 6, “Protected Mode Architec- 


ture.”’) 
4.7.1. FPU Register Usage 
In addition to the differences listed in Table 4-12, Table 4-13 summarizes the differences for the 
on-chip FPU. 
Table 4-12. Register Usage 
Use in Use in Use in 
Register Real Mode Protected Mode Virtual 8086 Mode 
Load Store Load Store Load Store 
General Registers Yes Yes Yes Yes Yes Yes 
Segment Register Yes Yes Yes Yes Yes Yes 
Flag Register Yes Yes Yes Yes IOPL(1) IOPL 
Control Registers Yes Yes PL = 0 PL=0 No Yes 


LDTR No No PL=0 Yes No No 

TR No No PL:= Yes No No 
Debug Registers Yes Yes PL= PL=0 No No 
Test Registers Yes Yes PL= PL=0 No No 


NOTES: 


1. IOPL: The PUSHF and POPF instructions are made I/O Privileg 


2. PL=0: The registers can be accessed only when the current privilege level is zero. 


Table 4-13. FPU Register Usage Differences 


e Level sensitive in Virtual 8086 Mode. 


Use in Use in Use in 
Register Real Mode Protected Mode Virtual 8086 Mode 
Load Store Load Store Load Store 
FPU Data Registers Yes Yes Yes Yes Yes Yes 
FPU Control Registers Yes Yes Yes Yes Yes Yes 
FPU Status Registers Yes Yes Yes Yes Yes Yes 
FPU Instruction Pointer Yes Yes Yes Yes Yes Yes 
FPU Data Pointer Yes Yes Yes Yes Yes Yes 
4-33 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


4.8 COMPATIBILITY WITH FUTURE PROCESSORS 


In the preceding register descriptions, note certain Intel486 processor register bits are Intel re- 
served. When reserved bits are called out, treat them as fully undefined. This is essential for your 
software compatibility with future processors. Follow the guidelines below: 


1. Do not depend on the states of any undefined bits when testing the values of defined 
register bits. Mask them out when testing. 


2. Do not depend on the states of any undefined bits when storing them to memory or 
another register. 


3. Do not depend on the ability to retain information written into any undefined bits. 
4. When loading registers, always load the undefined bits as zeros. 
5. However, registers that have been previously stored may be reloaded without masking. 


Depending upon the values of undefined register bits makes your software dependent upon the 
unspecified Intel486 processor handling of these bits. Depending on undefined values risks mak- 
ing your software incompatible with future processors that define usages for the Intel486 proces- 
sor-undefined bits. 


NOTE 
Avoid any software dependence on the state of undefined Intel486 Processor 
Register Bits. 
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CHAPTER 5 
REAL MODE ARCHITECTURE 


5.1. INTRODUCTION 


When the Intel486™ processor is powered up or reset, it is initialized in Real Mode. Real Mode 
has the same base architecture as the 8086 processor, except that it allows access to the 32-bit 
register set of the Intel486 processor. The Intel486 processor addressing mechanism, memory 
size, and interrupt handling are identical to those of Real Mode on the 80286 processor. 


All of the Intel486 processor instructions are available in Real Mode (except those instructions 
listed in Section 6.5.4, “Protection and I/O Permission Bitmap”). The default operand size in Real 
Mode is 16 bits, as in the 8086 processor. In order to use the 32-bit registers and addressing 
modes, override prefixes must be used. Also, the segment size on the Intel486 processor in Real 
Mode is 64 Kbytes, forcing 32-bit effective addresses to have a value less than OOOOFFFFH. The 
primary purpose of Real Mode is to enable Protected Mode operation. 


The LOCK prefix on the Intel486 processor, even in Real Mode, is more restrictive than on the 
80286 processor. This is due to the addition of paging on the Intel486 processor in Protected 
Mode and Virtual 8086 Mode. Paging makes it impossible to guarantee that repeated string in- 
structions can be LOCKed. The Intel486 processor cannot require that all pages holding the string 
be physically present in memory. Hence, a Page Fault (exception 14) might have to be taken dur- 
ing the repeated string instruction. Therefore, the LOCK prefix can not be supported during re- 
peated string instructions. 


Table 5-1 lists the only instruction forms in which the LOCK prefix is legal on the Intel486 pro- 
cessor. 


An exception 6 is generated if a LOCK prefix is placed before any instruction form or opcode not 
listed Table 5-1. The LOCK prefix allows indivisible read/modify/write operations on memory 
operands using the instructions Table 5-1. For example, even the ADD Reg, Mem instruction is 
not LOCKable, because the Mem operand is not the destination (and therefore no memory 
read/modify/operation is being performed). 


On the Intel486 processor, repeated string instructions are not LOCKable; therefore, it is not pos- 
sible to LOCK the bus for a long period of time. Therefore, the LOCK prefix is not IOPL-sensi- 
tive on the Intel486 processor. The LOCK prefix can be used at any privilege level, but only on 
the instruction forms listed in Table 5-1. 
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Table 5-1. Instruction Forms in which LOCK Prefix Is Legal 


Opcode Operands (Dest, Source) 
BIT Test and SET/RESET/COMPLEMENT Mem, Reg/immed. 
XCHG Reg, Mem 
CHG Mem, Reg 
ADD, OR, ADC, SBB, AND, SUB, XOR Mem, Reg/immed. 
NOT, NEG, INC, DEC Mem 
CMPXCHG, XADD Mem, Reg 


5.2 MEMORY ADDRESSING 


In Real Mode, the maximum memory size is limited to 1 Mbyte. (See Figure 5-1.) Thus, only ad- 
dress lines A[19:2] are active with this exception: after RESET address lines A[31:20] are high 
during CS-relative memory cycles until an intersegment jump or call is executed. See Section 9.5, 
“Reset and Initialization.” 


1 0 
OFFSET 


MAX LIMIT 
FIXED AT 64K IN 
REAL MODE 


MEMORY OPERAND 


SEGMENT BASE 240950-011 


Figure 5-1. Real Address Mode Addressing 


Because paging is not allowed in Real Mode, the linear addresses are the same as the physical 
addresses. Physical addresses are formed in Real Mode by adding the contents of the appropriate 
segment register, which is shifted left by four bits to create an effective address. This addition 
results in a physical address from O0000000H to 0010FFEFH. This is compatible with 80286 
Real Mode. Because segment registers are shifted left by 4 bits, Real Mode segments always start 
on 16-byte boundaries. 
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All segments in Real Mode are exactly 64-Kbytes long, and may be read, written, or executed. 
The Intel486 processor generates an exception 13 if a data operand or instruction fetch occurs 
past the end of a segment (i.e., if an operand has an offset greater than FFFFH, as when a word 
has a low byte at FFFFH and the high byte at O000H). 


Segments may be overlapped in Real Mode. If a segment does not use all 64 Kbytes, another seg- 
ment can be overlaid on top of the unused portion of the previous segment. This allows the pro- 
grammer to minimize the amount of physical memory needed for a program. 


5.3. RESERVED LOCATIONS 


There are two fixed areas in memory that are reserved in Real Address Mode: the system initial- 
ization area and the interrupt table area. Locations OOOO0OH through 003FFH are reserved for in- 
terrupt vectors. Each one of the 256 possible interrupts has a 4-byte jump vector reserved for it. 
Locations FFFFFFFOH through FFFFFFFFH are reserved for system initialization. 


5.4 INTERRUPTS 


Many of the exceptions discussed in Section 3.15.3, “Maskable Interrupt,” are not applicable to 
Real Mode operation, in particular exceptions 10, 11, 14, and 17, which do not occur in Real 
Mode. Other exceptions have slightly different meanings in Real Mode; Table 5-2 identifies these 
exceptions. 


5.5 SHUTDOWN AND HALT 


The HALT instruction stops program execution and prevents the Intel486 processor from using 
the local bus until restarted via the RESUME instruction. The Intel486 processor is forced out of 
halt by NMI, INTR with interrupts enabled (IF=1), or by RESET. If interrupted, the saved CS:IP 
points to the next instruction after the HLT. 


As in the case of Protected Mode, the shutdown occurs when a severe error is detected that pre- 
vents further processing. In Real Mode, shutdown can occur under the following two conditions: 


e An interrupt or an exception occurs (exceptions 8 or 13) and the interrupt vector is larger 
than the Interrupt Descriptor Table (i.e., there is not an interrupt handler for the interrupt). 


¢ ACALL, INT or PUSH instruction attempts to wrap around the stack segment when SP is 
not even (i.e., pushing a value on the stack when SP = 0001, resulting in a stack segment 
greater than FFFFH). 


An NMI input can bring the processor out of shutdown if the Interrupt Descriptor Table limit is 
large enough to contain the NMI interrupt vector (at least 0017H) and the stack has enough room 
to contain the vector and flag information (i.e., SP is greater than 0005H). If these conditions are 
not met, the Intel486 processor is unable to execute the NMI and executes another shutdown cy- 
cle. In this case, the Intel486 processor remains in the shutdown and can only exit via the RESET 
input. 


| 5-3 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL 


intel. 


Table 5-2. Exceptions with Different Meanings in Real Mode (see Table 5-1) 


, Interrupt . Return Address 
Function Number Related Instructions Lécation 
Interrupt table limit too small 8 INT Vector is not within table limit Before Instruction 


CS, DS, ES, FS, GS Segment 13 Word memory reference beyond offset = | Before Instruction 
overrun exception FFFFH. 

An attempt to execute past the end of CS 

segment. 
SS Segment overrun exception 12 Stack Reference beyond offset = FFFFH | Before Instruction 
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CHAPTER 6 
PROTECTED MODE ARCHITECTURE 


The full capabilities of the Intel486™ processor are available when it operates in Protected Vir- 
tual Address Mode (Protected Mode). Protected Mode vastly increases the linear address space 
to four Gbytes Wie bytes) and allows the processor to run virtual memory programs of almost 
unlimited size (64 terabytes or 2” bytes). In addition Protected Mode allows the Intel486 proces- 
sor to run all of the existing 8086, 80286 and Intel386™ processor software, while providing a 
sophisticated memory management and a hardware-assisted protection mechanism. Protected 
Mode allows the use of additional instructions that support multi-tasking operating systems. The 
base architecture of the Intel486 processor remains the same and the registers, instructions, and 
addressing modes described in the previous chapters are retained. The main difference between 
Protected Mode and Real Mode from a programmer’s view is the increased address space and a 
different addressing mechanism. 


6.1 ADDRESSING MECHANISM 


Like Real Mode, Protected Mode uses two components to form the logical address: a 16-bit se- 
lector is used to determine the linear base address of a segment, then the base address is added to 
a 32-bit effective address to form a 32-bit linear address. The linear address is either used as the 
32-bit physical address, or if paging is enabled, the paging mechanism maps the 32-bit linear ad- 
dress into a 32-bit physical address. 


The difference between the two modes lies in calculating the base address. In Protected Mode the 
selector is used to specify an index into an operating system defined table (see Figure 6-1). The 
table contains the 32-bit base address of a given segment. The physical address is formed by add- 
ing the base address obtained from the table to the offset. 


Paging provides an additional memory management mechanism that operates only in Protected 
Mode. Paging provides a means of managing the very large segments of the Intel486 processor. 
As such, paging operates beneath segmentation. The paging mechanism translates the protected 
linear address that comes from the segmentation unit into a physical address. Figure 6-2 shows 
the complete Intel486 processor addressing mechanism with paging enabled. 
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48/32 BIT POINTER 


SELECTOR OFFSET 


47/31 31/15 


SEGMENT LIMIT 


ay, 


MEMORY OPERAND 
ACCESS RIGHTS 

LIMIT 

BASE ADDRESS 


SEGMENT BASE 
ADDRESS 


UP TO 
4G BYTES 


SEGMENT 
DESCRIPTOR 


Figure 6-1. Protected Mode Addressing 


SELECTED 
SEGMENT 


241944-010 


48-bit Pointer Physical Address 


cree 
0 


15 


Intel486™ Physical 
A Righ 
Paging 
Mechanism 


Memory Operand 


Base Address 


, Page 
Linear Frame 
Segment Address Address 


Descriptor 


4K Bytes 


>< >| 


4K Bytes 


4K Bytes 


4K Bytes 


Physical Page 
4K Bytes 


4K Bytes 


4K Bytes 


|<—_>|<—__> |< > |<—__>|« > |< 


A5211-01 


Figure 6-2. Paging and Segmentation 
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6.2 SEGMENTATION 


6.2.1. Segmentation Introduction 


Segmentation is one method of memory management. Segmentation provides the basis for pro- 
tection. Segments are used to encapsulate regions of memory that have common attributes. For 
example, all of the code of a given program could be contained in a segment, or an operating sys- 
tem table may reside in a segment. All information about a segment is stored in an 8-byte data 
structure called a descriptor. All of the descriptors in a system are contained in tables recognized 
by hardware. 


6.2.2 Terminology 


The following terms are used throughout the discussion of descriptors, privilege levels and pro- 
tection: 


PL: Privilege Level One of the four hierarchical privilege levels. Level 0 is the most 
privileged level and level 3 is the least privileged. Higher privilege 
levels are numerically smaller than lower privilege levels. 


RPL: Requester The privilege level of the original supplier of the selector. RPL is 

Privilege Level determined by the least two significant bits of a selector. 

DPL: Descriptor The least privileged level at which a task may access that descriptor 

Privilege Level (and the segment associated with that descriptor). Descriptor Priv- 
ilege Level is determined by bits 6:5 in the Access Right Byte of a 
descriptor. 

CPL: Current The privilege level at which a task is currently executing, which 

Privilege Level equals the privilege level of the code segment being executed. CPL 


can also be determined by examining the lowest 2 bits of the CS 
register, except for conforming code segments. 


EPL: Effective The effective privilege level is the least privileged of the RPL and 
Privilege Level DPL. Because smaller privilege level values indicate greater priv- 
ilege, EPL is the numerical maximum of RPL and DPL. 


Task One instance of the execution of a program. Tasks are also referred 
to as processes. 
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6.2.3. Descriptor Tables 


6.2.3.1 Descriptor Tables Introduction 


The descriptor tables define all of the segments that are used in an Intel486 processor system (see 
Figure 6-3). There are three types of tables on the Intel486 processor that hold descriptors: the 
Global Descriptor Table, Local Descriptor Table, and the Interrupt Descriptor Table. All of the 
tables are variable length memory arrays. They range in size between 8 bytes and 64 Kbytes. 
Each table can hold up to 8192 8-byte descriptors. The upper 13 bits of a selector are used as an 
index into the descriptor table. The tables have registers associated with them that hold the 32-bit 
linear base address, and the 16-bit limit of each table. 


Each table has a different register associated with it: the GDTR, LDTR, and the IDTR (see Figure 
6-3). The LGDT, LLDT, and LIDT instructions load the base and limit of the Global, Local, and 
Interrupt Descriptor Tables, respectively, into the appropriate register. The SGDT, SLDT, and 
SIDT store the base and limit values. These tables are manipulated by the operating system. 
Therefore, the load descriptor table instructions are privileged instructions. 


15 


LDT DESCR 
FIR SELECTOR 


15 0 
IDT LIMIT 
IDT BASE 
LINEAR ADDRESS 
31 0 


15 0 
GDT LIMIT 
GDT BASE 
LINEAR ADDRESS 
31 0 


o 


LDT LIMIT 


LDT BASE 
LINEAR ADDRESS 


32 
PROGRAM INVISIBLE 
AUTOMATICALLY LOADED 
1 FROM LDT DESCRIPTOR 


IDTR 


GDTR 


240950-014 


Figure 6-3. Descriptor Table Registers 
6.2.3.2 Global Descriptor Table 


The Global Descriptor Table (GDT) contains descriptors that are possibly available to all of the 
tasks in a system. The GDT can contain any type of segment descriptor except for descriptors that 
are used for servicing interrupts (i.e., interrupt and trap descriptors). Every Intel486 processor 
system contains a GDT. Generally the GDT contains code and data segments used by the oper- 
ating systems and task state segments, and descriptors for the LDTs in a system. 
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The first slot of the Global Descriptor Table corresponds to the null selector and is not used. The 
null selector defines a null pointer value. 


6.2.3.3 Local Descriptor Table 


LDTs contain descriptors that are associated with a given task. Generally, operating systems are 
designed so that each task has a separate LDT. The LDT may contain only code, data, stack, task 
gate, and call gate descriptors. LDTs provide a mechanism for isolating a given task’s code and 
data segments from the rest of the operating system, while the GDT contains descriptors for seg- 
ments that are common to all tasks. A segment cannot be accessed by a task if its segment de- 
scriptor does not exist in either the current LDT or the GDT. This provides both isolation and 
protection for a task’s segments, while still allowing global data to be shared among tasks. 


Unlike the 6-byte GDT or IDT registers which contain a base address and limit, the visible por- 
tion of the LDT register contains only a 16-bit selector. This selector refers to a Local Descriptor 
Table descriptor in the GDT. 


6.2.3.4 Interrupt Descriptor Table 


The third table needed for Intel486 processor systems is the Interrupt Descriptor Table (see Fig- 
ure 6-4). The IDT contains the descriptors that point to the location of up to 256 interrupt service 
routines. The IDT may contain only task gates, interrupt gates, and trap gates. The IDT should be 
at least 256 bytes in order to hold the descriptors for the 32 Intel Reserved Interrupts. Every in- 
terrupt used by a system must have an entry in the IDT. The IDT entries are referenced via INT 
instructions, external interrupt vectors, and exceptions (see Section 3.15, “Interrupts,”’). 


> Memory 


Gate for 
Interrupt #n 


Gate for 
Interrupt #n-1 


a oie 
Descriptor 
Processor 
Table (DT) 
Gate for 
Interrupt #1 


Limit Increasing 
a Gate for Memory 


Interrupt #0 Addresses 
IDT Base 


A5212-01 


Figure 6-4. Interrupt Descriptor Table Register Use 
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6.2.4 Descriptors 


6.2.4.1 Descriptor Attribute Bits 


The object to which the segment selector points to is called a descriptor. Descriptors are eight- 
byte quantities that contain attributes about a given region of linear address space (i.e., a seg- 
ment). These attributes include the 32-bit base linear address of the segment; the 20-bit length 
and granularity of the segment; the protection level; read, write or execute privileges; the default 
size of the operands (16-bit or 32-bit); and the type of segment. All attribute information about a 
segment is contained in 12 bits in the segment descriptor. All segments on the Intel486 processor 
have three attribute fields in common: the Present (P) bit, the Descriptor Privilege Level (DPL) 
bit, and the Segment (S) bit. The P bit is 1 if the segment is loaded in physical memory. If P=0, 
any attempt to access this segment causes a not present exception (exception 11). The DPL is a 
two-bit field that specifies the protection level 0-3 associated with a segment. 


The Intel486 processor has two main categories of segments: system segments and non-system 
segments (for code and data). The S bit in the segment descriptor determines if a given segment 
is a system segment or a code or data segment. If the S bit is 1, the segment is either a code or 
data segment. If it is 0, the segment is a system segment. 


6.2.4.2 Intel486™ Processor Code, Data Descriptors (S=1) 


Figure 6-5 shows the general format of a code and data descriptor and Table 6-1 illustrates how 
the bits in the Access Rights Byte are interpreted. The Access Rights Bytes are bits 31:24 asso- 
ciated with the segment limit. 


Code and data segments have several descriptor fields in common. The accessed (A) bit is set 
whenever the processor accesses a descriptor. The A bit is used by operating systems to keep us- 
age statistics on a given segment. The G bit, or granularity bit, specifies if a segment length is 
byte-granular or page-granular. Intel486 processor segments can be one Mbyte long with byte 
granularity (G=0) or four Gbytes with page granularity (G=1), (i.e., 2) pages, each page 
4 Kbytes long). The granularity is unrelated to paging. An Intel486 processor system can consist 
of segments with byte granularity and page granularity, whether or not paging is enabled. 


The executable (E) bit tells if a segment is a code or data segment. A code segment (E=1, S=1) 
may be execute-only or execute/read as determined by the Read (R) bit. Code segments are exe- 
cute-only if R=0, and execute/read if R=1. Code segments may never be written to. 


NOTE 
Code segments can be modified via aliases. Aliases are writeable data 
segments that occupy the same range of linear address space as the code 
segment. 


The D bit indicates the default length for operands and effective addresses. If D=1, 32-bit oper- 
ands and 32-bit addressing modes are assumed. When D=0, 16-bit operands and 16-bit address- 
ing modes are assumed. All existing 80286 code segments execute on the Intel486 processor 
when the D bit is set 0. 
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Another attribute of code segments is determined by the conforming (C) bit. Conforming seg- 
ments, indicated when C=1, can be executed and shared by programs at different privilege levels 
(see Section 6.3, “Protection’’). 


a 0 Byte 

Segment Base 15...0 Segment Limit 15...0 es 
Base G/}D]0 | AVL Limit |P | DPL |S Type |A | Base 23...16 +4 
31.24 19...16 | | 


BASE Base Address of the segment 


LIMIT The length of the segment 

e Present Bit 1=Present, O=Not Present 

DPL Descriptor Privilege Level 0-3 

Ss Segment Descriptor 0=System Descriptor, 1=Code or Data Segment Descriptor 
TYPE Type of Segment 

A Accessed Bit 

G Granularity Bit 1=Segment length is page granular, O=Segment length is byte granular 
D Default Operation Size (recognized in code segment descriptors only) 


1=32-bit segment, 0=16-bit segment 


0 Bit must be zero (0) for compatibility with future processors 
AVL Available field for user or OS 
NOTE: 


In a maximum-size segment (i.e., a segment with G=1 and segment limit 19...0=-FFFFH), the lowest 12 bits of the seg- 
ment base should be zero (i.e., segment base 11...000=000H). 


Figure 6-5. Segment Descriptors 
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Table 6-1. Access Rights Byte Definition for Code and Data Descriptions 


Bit . 
Position Name Function 
7 Present (P) P=1 Segment is mapped into physical memory. 
P=0 _ |No mapping to physical memory exits, base and limit 
are not used. 
6-5 Descriptor Privilege Segment privilege attribute used in privilege tests. 
Level (DPL) 
4 Segment S=1 Code or Data (includes stacks) segment descriptor. 
Descriptor (S) S$=0_ | System Segment Descriptor or Gate Descriptor. 
If Data Segment (S = 1, E = 0) 
Executable (E) E=0_ | Descriptor type is data segment 
2 Expansion ED =0 | Expand up segment, offsets must be < limit. 
Direction (ED) ED=1 | Expand down segment, offsets must be > limit. 
1 Writeable (W) W=0_ | Data segment may not be written to. 


W=1_ | Data segment may be written to. 
If Code Segment (S = 1, E = 1) 


Executable (E) E=1 Descriptor type is code segment 


Conforming (C) C=1 | Code segment may only be executed when CPL = 
DPL and CPL remains unchanged. 


1 Readable (R) R=0_ | Code segment may not be read. 
R=1 Code segment may be read. 
0 Accessed (A) A=0_ | Segment has not been accessed. 
A=1 Segment selector has been loaded into segment 


register or used by selector test instructions. 


Segments identified as data segments (E=0, S=1) are used for two types of Intel486 processor 
segments: stack and data segments. The expansion direction (ED) bit specifies if a segment ex- 
pands downward (stack) or upward (data). If a segment is a stack segment, all offsets must be 
greater than the segment limit. On a data segment, all offsets must be less than or equal to the 
limit. In other words, stack segments start at the base linear address plus the maximum segment 
limit and grow down to the base linear address plus the limit. On the other hand, data segments 
start at the base linear address and expand to the base linear address plus limit. 


The write W bit controls the ability to write into a segment. Data segments are read-only if W=0. 
The stack segment must have W=1. 


The B bit controls the size of the stack pointer register. If B=1, then PUSHes, POPs, and CALLs 
all use the 32-bit ESP register for stack references and assume an upper limit of FRFFFFFFH. If 
B=0, stack instructions all use the 16-bit SP register and assume an upper limit of FFFFH. 
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6.2.4.3 System Descriptor Formats 


System segments describe information about operating system tables, tasks, and gates. Figure 6-6 
shows the general format of system segment descriptors, and the various types of system seg- 
ments. Intel486 processor system descriptors contain a 32-bit base linear address and a 20-bit seg- 
ment limit. 80286 system descriptors have a 24-bit base address and a 16-bit segment limit. 80286 
system descriptors are identified by the upper 16 bits being all zero. 


31 16 0 Byte 


Segment Base 15...0 Segment Limit 15...0 a 


Base 31..24 | G|0O | 0/0 Limit |P | DPL |0 Type | Base 23...16 +4 
19...16 | | | 


Type Defines Type Defines 
0 Invalid 8 Invalid 
1 Available 80286 TSS 9 Available Intel486 processor TSS 
2 LDT A Undefined (Intel Reserved) 
3 Busy 80286 TSS B Busy Intel486 processor TSS 
4 80286 call gate Cc Intel486 processor call gate 
5 Task Gate (for 80286, Intel486™ processor task) D Undefined (Intel Reserved) 
6 80286 interrupt gate E Intel486 processor 
7 80286 trap gate F Intel486 processor 


Figure 6-6. System Segment Descriptors 
6.2.4.4 LDT Descriptors (S=0, TYPE=2) 


LDT descriptors (S=0, TYPE=2) contain information about Local Descriptor Tables. LDTs con- 
tain a table of segment descriptors, unique to a particular task. Because the instruction to load the 
LDTR is only available at privilege level 0, the DPL field is ignored. LDT descriptors are only 
allowed in the Global Descriptor Table (GDT). 


6.2.4.5 TSS Descriptors (S=0, TYPE=1, 3, 9, B) 


A Task State Segment (TSS) descriptor contains information about the location, size, and privi- 
lege level of a Task State Segment (TSS). A TSS in turn is a special fixed format segment that 
contains all the state information for a task and a linkage field to permit nesting tasks. The TYPE 
field is used to indicate whether the task is currently busy (i.e., on a chain of active tasks) or the 
TSS is available. The TYPE field also indicates if the segment contains an 80286 processor TSS 
or an Intel486 processor TSS. The Task Register (TR) contains the selector that points to the cur- 
rent Task State Segment. 
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6.2.4.6 Gate Descriptors (S=0, TYPE=4-7, C, F) 


Gates are used to control access to entry points within the target code segment. The various types 
of gate descriptors are call gates, task gates, interrupt gates, and trap gates. Gates provide a level 
of indirection between the source and destination of the control transfer. This indirection allows 
the processor to automatically perform protection checks. It also allows system designers to 
control entry points to the operating system. Call gates are used to change privilege levels (see 
Section 6.3, “Protection’”’), task gates are used to perform a task switch, and interrupt and trap 
gates are used to specify interrupt service routines. 


Figure 6-7 shows the format of the four types of gate descriptors. Call gates are primarily used to 
transfer program control to a more privileged level. The call gate descriptor consists of three 
fields: the access byte, a long pointer (selector and offset) that points to the start of a routine, and 
a word count that specifies how many parameters are to be copied from the caller's stack to the 
stack of the called routine. The word count field is only used by call gates when there is a change 
in the privilege level; other types of gates ignore the word count field. 


31 24 16 8 5 0 Byte 
Selector Offset 15...0 far as 
Word 
Offset 31...16 P | DPL |0 Type 0/040] Count +4 


Gate Descriptor Fields 
Name Value Description 


Type 4 80286 call gate 

Task gate (for 80286 or Intel486 processor task) 
80286 interrupt gate 

80286 trap gate 

Intel486 processor call gate 

Intel486 processor interrupt gate 

Intel486 processor trap gate 

Descriptor contents are not valid 


-=-omnmmoanoa 


Descriptor contents are valid 


DPL—least privileged level at which a task may access the gate. WORD COUNT 0-31—the number of parameters to 
copy from caller's stack to the called procedure's stack. The parameters are 32-bit quantities for Intel486 processor 
gates, and 16-bit quantities for 80286 gates. 


DESTINATION 16-bit Selector to the target code segment 
SELECTOR selector or 
Selector to the target task state segment for task gate 
DESTINATION offset Entry point within the target code segment 
OFFSET 16-bit 80286 


32-bit Intel486 processor 


Figure 6-7. Gate Descriptor Formats 


6-10 


intel @ PROTECTED MODE ARCHITECTURE 


Interrupt and trap gates use the destination selector and destination offset fields of the gate de- 
scriptor as a pointer to the start of the interrupt or trap handler routines. The difference between 
interrupt gates and trap gates is that the interrupt gate disables interrupts (resets the IF bit), where- 
as the trap gate does not. 


Task gates are used to switch tasks. Task gates may only refer to a task state segment (see Section 
6.3.6, “Task Switching’’). Therefore, only the destination selector portion of a task gate descriptor 
is used, and the destination offset is ignored. 


Exception 13 is generated when a destination selector does not refer to a correct descriptor type, 
i.e., a code segment for an interrupt, trap or call gate, or a TSS for a task gate. 


The access byte format is the same for all gate descriptors. P=1 indicates that the gate contents 
are valid. P=0 indicates the contents are not valid and causes exception 11 when referenced. DPL 
is the descriptor privilege level and specifies when this descriptor may be used by a task (see Sec- 
tion 6.3, “Protection’’). The S field, bit 4 of the access rights byte, must be 0 to indicate a system 
control descriptor. The type field specifies the descriptor type as indicated in Figure 6-7. 


6.2.4.7 Differences Between Intel486™ Processor and 80286 Descriptors 


In order to provide operating system compatibility between 80286 and Intel486 processors, the 
Intel486 processor supports all of the 80286 segment descriptors. Figure 6-8 shows the general 
format of an 80286 system segment descriptor. The only differences between 80286 and Intel486 
processor descriptor formats are that the values of the type fields, and the limit and base address 
fields have been expanded for the Intel486 processor. The 80286 system segment descriptors 
contain a 24-bit base address and 16-bit limit, while the Intel486 processor system segment de- 
scriptors have a 32-bit base address, a 20-bit limit field, and a granularity bit. 


By supporting 80286 system segments, the Intel486 processor is able to execute 80286 applica- 
tion programs on an Intel486 processor operating system. This is possible because the Intel486 
processor automatically understands which descriptors are 80286-style descriptors and which are 
Intel486 processor-style descriptors. In particular, if the upper word of a descriptor is zero, then 
that descriptor is an 80286-style descriptor. 


The only other differences between 80286-style descriptors and Intel486 processor descriptors 
are the interpretation of the word count field of call gates and the B bit. The word count field spec- 
ifies the number of 16-bit quantities to copy for 80286 call gates and 32-bit quantities for Intel486 
processor call gates. The B bit controls the size of PUSHes when using a call gate; if B=0 PUSHes 
are 16 bits, if B=1 PUSHes are 32 bits. 
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Intel Reserved P | DPL |S Type Base 23...16 +4 
Set to 0 | | | | 

BASE Base Address of the segment 

LIMIT The length of the segment 

P Present Bit: 1=Present, 0=Not Present 

DPL Descriptor Privilege Level 0-3 

Ss System Descriptor: 0=System, 1=User 

TYPE Type of Segment 


Figure 6-8. 80286 Code and Data Segment Descriptors 


6.2.4.8 Selector Fields 


A selector in Protected Mode has three fields: Local or Global Descriptor Table Indicator (TI), 
Descriptor Entry Index (Index), and Requester (the selector’s) Privilege Level (RPL) as shown in 
Figure 6-9. The TI bits select one of two memory-based tables of descriptors (the Global Descrip- 
tor Table or the Local Descriptor Table). The Index selects one of 8 K descriptors in the appro- 
priate descriptor table. The RPL bits allow high speed testing of the selector’s privilege attributes. 


6.2.4.9 Segment Descriptor Cache 


In addition to the selector value, every segment register has a segment descriptor cache register 
associated with it. Whenever a segment register’s contents are changed, the 8-byte descriptor as- 
sociated with that selector is automatically loaded (cached) on the chip. Once loaded, all refer- 
ences to that segment use the cached descriptor information instead of re-accessing the 
descriptor. The contents of the descriptor cache are not visible to the programmer. Because de- 
scriptor caches only change when a segment register is changed, programs that modify the de- 
scriptor tables must reload the appropriate segment registers after changing a descriptor’s value. 


6.2.4.10 | Segment Descriptor Register Settings 


The contents of the segment descriptor cache vary depending on the mode in which the Intel486 
processor is operating. When operating in Real Address Mode, the segment base, limit, and other 
attributes within the segment cache registers are defined as shown in Figure 6-10. For compati- 
bility with the 8086 architecture, the base is set to 16 times the current selector value, the limit is 
fixed at OOOOFFFFH, and the attributes are fixed to indicate that the segment is present and fully 
usable. In Real Address Mode, the internal “privilege level” is always fixed to the highest level, 
level 0, so I/O and other privileged opcodes may be executed. 


PROTECTED MODE ARCHITECTURE 


Selector 


Segment 


15 he ZB 4A 
TI | RPL 
Register 1/1141 


Table Indicator 
Tl=1 


Descriptor 


aaa umber a ae 


0 ) Null 
Local Descriptor Global Descriptor 
Table Table 
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Figure 6-9. Example Descriptor Selection 
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SEGMENT DESCRIPTOR CACHE REGISTER CONTENTS 
32 -BIT BASE 32 - BIT LIMIT OTHER ATTRIBUTES 
(UPDATED DURING SELECTOR (FIXED) (FIXED) 


LOAD INTO SEGMENT REGISTER) 


CONFORMING PRIVILEGE 
STACK SIZE 
EXECUTABLE 
WRITEABLE 
READABLE 
EXPANSION DIRECTION 
GRANULARITY 


ACCESSED 

PRIVILEGE LEVEL 

PRESENT | 
BASE LIMIT 1 


cs 16X CURRENT CS SELECTOR’ OOOOFFFFH Y|-|N 
$s 16X CURRENT SS SELECTOR OOOOFFFFH N[ w] - 
DS 16X CURRENT DS SELECTOR OQOOOFFFFH N 2 
ES 16X CURRENT ES SELECTOR OQOOOFFFFH N = 
FS 16X CURRENT FS SELECTOR OOOOFFFFH N|{ -| - 
GS 16X CURRENT GS SELECTOR OOOOFFFFH N = 
240950-017 

Key: 

Y =yes D = expand down 

N  =no B = byte granularity 

0 = =privilege level 0 P = page granularity 

1 = privilege level 1 W = push/pop 16-bit words 

2 =privilege level 2 F = push/pop 32-bit dwords 

3. =privilege level 3 — = does not apply to that segment cache register 


U = expand up 


CALL, or intersegment JMP, or INT). (See Figure 6-12 for an example.) 


TExcept the 32-bit CS base is initialized to FFFFFOOOH after reset until first intersegment control transfer (i.e., intersegment 


Figure 6-10. Segment Descriptor Caches for Real Address Mode 
(Segment Limit and Attributes Are Fixed) 


When operating in Protected Mode, the segment base, limit, and other attributes within the seg- 
ment cache registers are defined as shown in Figure 6-11. In Protected Mode, each of these fields 
are defined according to the contents of the segment descriptor indexed by the selector value 


loaded into the segment register. 


When operating in a Virtual 8086 Mode within the Protected Mode, the segment base, limit, and 
other attributes within the segment cache registers are defined as shown in Figure 6-12. For com- 
patibility with the 8086 architecture, the base is set to sixteen times the current selector value, the 
limit is fixed at OOOOFFFFH, and the attributes are fixed so as to indicate the segment is present 
and fully usable. The virtual program executes at lowest privilege level, level 3, to allow trapping 


of all IOPL-sensitive instructions and level-O-only instructions. 
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CONFORMING PRIVILEGE 
STACK SIZE 
EXECUTABLE 
WRITEABLE 
READABLE 
EXPANSION DIRECTION 
GRANULARITY 
ACCESSED 
PRIVILEGE LEV 


PRESENT 
BASE LIMIT 1 


SEGMENT DESCRIPTOR CACHE REGISTER CONTENTS 
32 - BIT BASE 32 - BIT LIMIT OTHER ATTRIBUTES 
(UPDATED DURING (UPDATED DURING (UPDATED DURING 
SELECTOR LOAD INTO SELECTOR LOAD INTO SELECTOR LOAD INTO 
SEGMENT REGISTER) SEGMENT REGISTER) SEGMENT REGISTER) 


(special case for SS) 


- = does not apply to that segment cache register 


240950-018 

Key: 
Y = fixed yes 
N = fixed no 
d.. «= per segment descriptor 
p = per segment descriptor; descriptor must indicate “present” to avoid exception 11 

(exception 12 in case of SS) 
r = per segment descriptor, but descriptor must indicate “readable” to avoid exception 13 

(special case for SS) 
w = per segment descriptor, but descriptor must indicate “writeable” to avoid exception 13 


Figure 6-11. Segment Descriptor Caches for Protected Mode (Loaded per Descriptor) 
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cwnrst 90 2 XK 


Key: 


SEGMENT DESCRIPTOR CACHE REGISTER CONTENTS 
32-BIT BASE 32 - BIT LIMIT OTHER ATTRIBUTES 
(UPDATED DURING SELECTOR (FIXED) (FIXED) 


LOAD INTO SEGMENT REGISTER) 


CONFORMING PRIVILEGE 
STACK SIZE 
EXECUTABLE 
WRITEABLE 
READABLE 
EXPANSION DIRECTION 
GRANULARITY 


ACCESSED 
PRIVILEGE LEVEL 
PRESENT 
BASE LIMIT ul 


16X CURRENT CS SELECTOR | OOOOFFFFH 
16X CURRENT SS SELECTOR OOOOFFFFH 
16X CURRENT DS SELECTOR 
16X CURRENT ES SELECTOR 
16X CURRENT FS SELECTOR 


OOOOFFFFH 


cs. | 16X CURRENT GS SELECTOR OOOOFFFFH - 
240950-019 
= yes D = expand down 
=no B = byte granularity 
= privilege level 0 P = page granularity 
= privilege level 1 W = push/pop 16-bit words 
= privilege level 2 F = push/pop 32-bit dwords 
= privilege level 3 — = does not apply to that segment cache register 
= expand up 


Figure 6-12. Segment Descriptor Caches for Virtual 8086 Mode within Protected Mode 
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6.3 PROTECTION 


6.3.1 Protection Concepts 


The Intel486 processor has four levels of protection that support multi-tasking by isolating and 
protecting user programs from each other and the operating system. The privilege levels control 
the use of privileged instructions, I/O instructions, and access to segments and segment descrip- 
tors. Unlike traditional processor-based systems, in which this protection is achieved only 
through the use of complex external hardware and software, the Intel486 processor provides the 
protection as part of its integrated Memory Management Unit. The Intel486 processor offers an 
additional type of protection on a page basis, when paging is enabled. See Section 6.4.3, “Page 
Level Protection (R/W, U/S Bits).” 


The four-level hierarchical privilege system is illustrated in Figure 6-13. It is an extension of the 
user/supervisor privilege mode commonly used by minicomputers. The user/supervisor mode is 
fully supported by the Intel486 processor paging mechanism. The privilege levels (PLs) are num- 
bered 0 through 3. Level 0 is the most privileged or trusted level. 


APPLICATIONS 
PROCESSOR 
ENFORCED 
SOFTWARE 


INTERFACES OS EXTENSIONS 


PKERNEL 
PL-o 
MOST 
HIGH SPEED PRIVILEGED 
OPERATING 
SYSTEM 
INTERFACE 


240950-020 


Figure 6-13. Four-Level Hierarchical Protection 


6.3.2 Rules of Privilege 


The Intel486 processor controls access to both data and procedures between levels of a task, ac- 
cording to the following rules. 


¢ Data stored in a segment with privilege level p can be accessed only by code executing at a 
privilege level at least as privileged as p. 


¢ Acode segment/procedure with privilege level p can only be called by a task executing at 
the same or a lesser privilege level than p. 
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6.3.3 Privilege Levels 


6.3.3.1 Task Privilege 


At any point in time, a task on the Intel486 processor always executes at one of the four privilege 
levels. The current privilege level (CPL) specifies the task’s privilege level. A task's CPL may be 
changed only by control transfers through gate descriptors to a code segment with a different 
privilege level (see Section 6.3.4, “Privilege Level Transfers”). Thus, an application program 
running at PL =3 may call an operating system routine at PL = | (via a gate), which would cause 
the task's CPL to be set to 1 until the operating system routine finishes. 


6.3.3.2 Selector Privilege (RPL) 


The privilege level of a selector is specified by the RPL field. The RPL is the two least significant 
bits of the selector. The selector's RPL is used only to establish a less trusted privilege level than 
the current privilege level for the use of a segment. This level is called the task's effective privi- 
lege level (EPL). The EPL is defined as the least privileged (i.e., numerically larger) level of a 
task's CPL and a selector's RPL. Thus, if selector's RPL = 0 then the CPL always specifies the 
privilege level for making an access using the selector. On the other hand, if RPL = 3, a selector 
can only access segments at level 3 regardless of the task's CPL. The RPL is most commonly used 
to verify that pointers passed to an operating system procedure do not access data that is of higher 
privilege than the procedure that originated the pointer. Because the originator of a selector can 
specify any RPL value, the Adjust RPL (ARPL) instruction is provided to force the RPL bits to 
the originator's CPL. 


6.3.3.3 I/O Privilege and I/O Permission Bitmap 


The I/O privilege level (IOPL, a 2-bit field in the EFLAG register) defines the least privileged 
level at which I/O instructions can be unconditionally performed. I/O instructions can be uncon- 
ditionally performed when CPL = IOPL. (The I/O instructions are IN, OUT, INS, OUTS, REP 
INS, and REP OUTS.) When CPL > IOPL and the current task is associated with a 286 TSS, at- 
tempted I/O instructions cause an exception 13 fault. When CPL > IOPL and the current task is 
associated with an Intel486 processor TSS, the I/O permission bitmap (part of an Intel486 pro- 
cessor TSS) is consulted on whether I/O to the port is allowed; otherwise an exception 13 fault is 
generated. For diagrams of the I/O Permission Bitmap, refer to Figures 6-14 and 6-15. For further 
information on how the I/O Permission Bitmap is used in Protected Mode or in Virtual 8086 
Mode, refer to Section 6.5.4, “Protection and I/O Permission Bitmap.” 


The I/O privilege level (IOPL) also affects whether several other instructions can be executed or 
whether an exception 13 fault should be generated. These instructions, called “IOPL-sensitive” 
instructions, are CLI and STI. (Note that the LOCK prefix is not IOPL-sensitive on the Intel486 
processor.) 
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31 1615 0 


fooooso000000000 | ______Back Link ]0 
a = 
Foooossoo00000o <P SSSSCSCSCSC~« SOY 
a = ( 
fooooooono0oIo PC SSTIIO. 7 CPL, 1,2 
PePDOSCSCSCSSCSC«*dCSA 
Foooosoo0000000 «| SSSCSCSCSC~ S| TB 
ore iC 
2 
[______EFLAGS____———SSS——~d2 


TSS Base 


Current 
oo), “ee 
40 State 


48 
4c 
50 
54 
58 
5C 
60 
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Available System Status, etc., 68 Debug Trap Bit 
inside Intel486™ Processor TSS 


rai eaea__ tots 8]7__o}2 
6356 Bit_Map_Offset 
3568 


po Offset + 10 
Rights | Limit al 


: /O Permission Bitmap 
Access Rights 65407 (one bit per I/O port, [Offset + 1FEC 
Program 65439 | sO ffset + 1FFO 


bit may be truncated 
using TSS limit) 


Invisible «Offset + 1FF4 
6550385472 | Offset + IFS 

Task Register Ce 65504 | Offset + 1FFC 
‘FFH'} Offset + 2000 


TSS Limit = Offset + 2000H 


15 0 


31. Intel486™ Processor TSS Descriptor (in GDT) 0 


Segment Base 15..0 Segment Base 15..0 
Base Limit 
sta] oo} aa]? Peto 


Type=9: Available Intel486™ Processor TSS 
Type=B: Busy Intel486™ Processor TSS 


Note: 
BIT_MAP_OFFSET must be < DFFFH 


A5214-01 


Figure 6-14. Intel486™ Processor TSS and TSS Registers 
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Figure 6-15. Sample I/O Permission Bit Map 


The IOPL also affects whether the IF (interrupts enable flag) bit can be changed by loading a val- 
ue into the EFLAGS register. When CPL 2 IOPL, the IF bit can be changed by loading a new 
value into the EFLAGS register. When CPL > IOPL, the IF bit cannot be changed by a new value 
POPed into (or otherwise loaded into) the EFLAGS register; the IF bit remains unchanged and 
no exception is generated. 


6.3.3.4 Privilege Validation 


The Intel486 processor provides several instructions to speed pointer testing and help maintain 
system integrity by verifying that the selector value refers to an appropriate segment. Table 6-2 
summarizes the selector validation procedures available for the Intel486 processor. 


Table 6-2. Pointer Test Instructions 


Instruction Operands Function 


ARPL Selector, Register | Adjust Requested Privilege Level: adjusts the RPL of the selector to 
the numeric maximum of current selector RPL value and the RPL value 
in the register. Set zero flag if selector RPL was changed. 


VERR Selector VERify for Read: sets the zero flag if the segment referred to by the 
selector can be read. 
VERW Selector VERify for Write: sets the zero flag if the segment referred to by the 
selector can be written. 
LSL Register, Selector | Load Segment Limit: reads the segment limit into the register if 
privilege rules and descriptor type allow. Set zero flag if successful. 
LAR Register, Selector | Load Access Rights: reads the descriptor access rights byte into the 


register if privilege rules allow. Set zero flag if successful. 


This pointer verification prevents this common problem: An application at PL = 3 calls an oper- 
ating systems routine at PL = 0, and then passes the operating system routine a “bad” pointer that 
corrupts a data structure belonging to the operating system. This problem can be avoided if the 
operating system routine uses the ARPL instruction to ensure that the RPL of the selector has no 
greater privilege than that of the caller. 
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6.3.3.5 Descriptor Access 


There are two types of segment accesses: those involving code segments such as control transfers, 
and those involving data accesses. Determining the ability of a task to access a segment requires 
determining the type of segment to be accessed, the instruction used, the type of descriptor used, 
and CPL, RPL, and DPL, as described above. 


Any time an instruction loads data segment registers (DS, ES, FS, GS) the Intel486 processor 
makes protection validation checks. Selectors loaded in the DS, ES, FS, GS registers must refer 
only to data segments or readable code segments. (The data access rules are specified in Section 
6.3.2, “Rules of Privilege.” The only exception to those rules is readable conforming code seg- 
ments, that can be accessed at any privilege level.) Finally, the privilege validation checks are 
performed. The CPL is compared to the EPL, and if the EPL is more privileged than the CPL, an 
exception 13 (general protection fault) is generated. 


The rules for the stack segment are slightly different than those for data segments. Instructions 
that load selectors into the SS must refer to data segment descriptors for writeable data segments. 
The DPL and RPL must equal the CPL. All other descriptor types and privilege level violations 
cause exception 13. A stack not present fault causes exception 12. Note that an exception 11 is 
used for a not-present code or data segment. 


6.3.4 Privilege Level Transfers 


Inter-segment control transfers occur when a selector is loaded in the CS register. In a typical sys- 
tem, most of these transfers are the result of a call or a jump to another routine. There are five 
types of control transfers, which are summarized in Table 6-3. Many of these transfers result in 
a privilege level transfer. Changing privilege levels is done only via control transfers, by using 
gates, task switches, and interrupt or trap gates. 


Control transfers can only occur if the operation that loaded the selector references the correct 
descriptor type. Any violation of these descriptor usage rules causes an exception 13 (e.g., JMP 
through a call gate, or IRET from a normal subroutine call). 


To provide further system security, all control transfers are also subject to the privilege rules. 
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Table 6-3. Descriptor Types Used for Control Transfer 


intel. 


F Descriptor Descriptor 
Control Transfer Types Operation Types Referenced Table 
Intersegment within the same privilege level | JMP, CALL, RET, IRET Code Segment GDT/LDT 
Intersegment to the same or higher privilege | CALL Call Gate GDT/LDT 
level 
Interrupt within task may change CPL Interrupt Instruction, Trap or Interrupt IDT 
Exception, External Gate 
Interrupt 
Intersegment to a lower privilege level RET, IRET) Code Segment GDT/LDT 
nanaee eaeRt) CALL, JMP Task State GDT 
Segment 
Task Switch CALL, JMP Task Gate GDT/LDT 
IRET® Task Gate IDT 
Interrupt Instruction, 
Exception, External 
Interrupt 
NOTES: 


1. NT (Nested Task bit of flag register) = 0 
2. NT (Nested Task bit of flag register) = 1 


The privilege rules require that: 


¢ Privilege level transitions can only occur via gates. 


¢ JMPs can be made to a non-conforming code segment with the same privilege or to a 


conforming code segment with greater or equal privilege. 


¢ CALLs can be made to a non-conforming code segment with the same privilege or via a 


gate to a more privileged level. 


¢ Interrupts handled within the task obey the same privilege rules as CALLs. 


¢ Conforming code segments are accessible by privilege levels that are the same or less 
privileged than the conforming-code segment’s DPL. 


¢ Both the requested privilege level (RPL) in the selector pointing to the gate and the task’s 
CPL must be of equal or greater privilege than the gate’s DPL. 


¢ The code segment selected in the gate must be the same or more privileged than the task’s 


CPL. 


¢ Return instructions that do not switch tasks can only return control to a code segment with 


the same or less privilege. 


¢ Task switches can be performed by a CALL, JMP, or INT that references either a task gate 
or task state segment who’s DPL is less privileged or the same privilege as the old task’s 


CPL. 
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Any control transfer that changes CPL within a task causes a change of stacks as a result of the 
privilege level change. The initial values of SS:ESP for privilege levels 0, 1, and 2 are retained 
in the task state segment (see Section 6.3.6, “Task Switching”). During a JMP or CALL control 
transfer, the new stack pointer is loaded into the SS and ESP registers and the previous stack 
pointer is pushed onto the new stack. 


When returning to the original privilege level, use of the lower-privileged stack is restored as part 
of the RET or IRET instruction operation. For subroutine calls that pass parameters on the stack 
and cross privilege levels, a fixed number of words (as specified in the gate's word count field) 
are copied from the previous stack to the current stack. The inter-segment RET instruction with 
a stack adjustment value correctly restores the previous stack pointer upon return. 


6.3.5 Call Gates 


Gates provide protected, indirect CALLs. One of the major uses of gates is to provide a secure 
method of privilege transfers within a task. Because the operating system defines all of the gates 
in a system, it can ensure that all gates allow entry into a few trusted procedures only (such as 
those that allocate memory or perform I/O). 


Gate descriptors follow the data access rules of privilege; that is, gates can be accessed by a task 
if the EPL is equal to or more privileged than the gate descriptor's DPL. Gates follow the control 
transfer rules of privilege and therefore may only transfer control to a more privileged level. 


Call Gates are accessed via a CALL instruction and are syntactically identical to calling a normal 
subroutine. When an inter-level Intel486 processor call gate is activated, the following actions 
occur. 


1. Load CS:EIP from gate check for validity. 

2. SS is pushed zero-extended to 32 bits. 

3. ESP is pushed. 

4. Copy Word Count 32-bit parameters from the old stack to the new stack. 
5. Push Return address on stack. 


The procedure is identical for 80286 Call gates, except that 16-bit parameters are copied and 16- 
bit registers are pushed. 


Interrupt gates and trap gates work in a similar fashion as the call gates, except there is no copying 
of parameters. The only difference between trap and interrupt gates is that control transfers 
through an interrupt gate disable further interrupts (i.e., the IF bit is set to 0), and trap gates leave 
the interrupt status unchanged. 
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6.3.6 Task Switching 


An important attribute of any multi-tasking/multi-user operating system is its ability to switch be- 
tween tasks or processes rapidly. The Intel486 processor directly supports this operation by pro- 
viding a task switch instruction in hardware. The Intel486 processor task switch operation saves 
the entire state of the machine (all of the registers, address space, and a link to the previous task), 
loads a new execution state, performs protection checks, and commences execution in the new 
task, in about 10 microseconds. Like transfer of control via gates, the task switch operation is in- 
voked by executing an inter-segment JMP or CALL instruction that refers to a Task State Seg- 
ment (TSS) or a task gate descriptor in the GDT or LDT. An INT n instruction, exception, trap, 
or external interrupt may also invoke the task switch operation if there is a task gate descriptor in 
the associated IDT descriptor slot. 


The TSS descriptor points to a segment (see Figure 6-14) containing the entire Intel486 processor 
execution state whereas a task gate descriptor contains a TSS selector. The Intel486 processor 
supports both 80286 and Intel486 processor-style TSSs. Figure 6-16 shows an 80286 TSS. The 
limit of an Intel486 processor TSS must be greater than 0064H (002BH for an 80286 TSS), and 
can be as large as 4 Gbytes. In the additional TSS space, the operating system is free to store ad- 
ditional information, such as the reason the task is inactive, the time the task has spent running, 
and the open files belonging to the task. 
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Figure 6-16. 80286 TSS 
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Each task must have a TSS associated with it. The current TSS is identified by a special register 
in the Intel486 processor called the Task State Segment Register (TR). This register contains a 
selector referring to the task state segment descriptor that defines the current TSS. A hidden base 
register and limit register associated with TR are loaded whenever TR is loaded with a new se- 
lector. Returning from a task is accomplished by the IRET instruction. When IRET is executed, 
control is returned to the task that was interrupted. The currently executing task’s state is saved 
in the TSS and the old task state is restored from its TSS. 


Several bits in the flag register and machine status word (CRO) give information about the state 
of a task that is useful to the operating system. The Nested Task (NT) (bit 14 in EFLAGS) con- 
trols the function of the IRET instruction. If NT = 0, the IRET instruction performs the regular 
return; when NT = 1, IRET performs a task switch operation back to the previous task. The NT 
bit is set or reset in the following fashion: 


When a CALL or INT instruction initiates a task switch, the new TSS is marked busy and the 
back link field of the new TSS set to the old TSS is selector. The NT bit of the new task is set by 
CALL or INT initiated task switches. An interrupt that does not cause a task switch clears NT. 
(The NT bit is restored after execution of the interrupt handler.) NT may also be set or cleared by 
POPF or IRET instructions. 


The Intel486 processor task state segment is marked busy by changing the descriptor type field 
from TYPE 9H to TYPE BH. An 80286 TSS is marked busy by changing the descriptor type field 
from TYPE | to TYPE 3. Use of a selector that references a busy task state segment causes an 
exception 13. 


The Virtual Mode (VM) bit 17 is used to indicate if a task is a virtual 8086 task. If VM = 1, the 
tasks use the Real Mode addressing mechanism. The virtual 8086 environment is entered and ex- 
ited only via a task switch (see Section 6.5, “Virtual 8086 Environment’). 


The T bit in the Intel486 processor TSS indicates that the processor should generate a debug ex- 
ception when switching to a task. If T= 1, a debug exception | is generated upon entry to a new 
task 


6.3.6.1 Floating-Point Task Switching 


The FPU's state is not automatically saved when a task switch occurs, because the incoming task 
may not use the FPU. The Task Switched (TS) Bit (bit 3 in the CRO) helps identify the FPU’s 
state in a multi-tasking environment. Whenever the Intel OverDrive processors switch tasks, they 
set the TS bit. The Intel OverDrive processors detect the first use of a processor extension instruc- 
tion after a task switch and causes the processor extension not available exception 7. The excep- 
tion handler for exception 7 may then decide whether to save the state of the FPU. A processor 
extension not present exception (7) occurs when attempting to execute a Floating-Point or WAIT 
instruction if the Task Switched and Monitor coprocessor extension bits are both set (i.e., TS = 1 
and MP = 1). 
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6.3.7 Initialization and Transition to Protected Mode 


Because the Intel486 processor begins executing in Real Mode immediately after RESET, it is 
necessary to initialize the system tables and registers with the appropriate values. 


The GDT and IDT registers must refer to a valid GDT and IDT. The IDT should be at least 256- 
bytes long, and GDT must contain descriptors for the initial code and data segments. Figure 6-17 
shows the tables and Figure 6-18 shows the descriptors needed for a simple Protected Mode 
Intel486 processor system. It has a single code and single data/stack segment, each four-Gbytes 
long, and a single privilege level, PL = 0. 


The actual method of enabling Protected Mode is to load CRO with the PE bit set via the MOV 
CRO, R/M instruction. 


After enabling Protected Mode, the next instruction should execute an intersegment JMP to load 
the CS register and flush the instruction decode queue. The final step is to load all of the data seg- 
ment registers with the initial selector values. 
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Figure 6-17. Simple Protected System 


An alternate approach to entering Protected Mode that is especially appropriate for multi-tasking 
operating systems is to use the built-in task-switch to load all the registers. In this case, the GDT 
contains two TSS descriptors in addition to the code and data descriptors needed for the first task. 
The first JMP instruction in Protected Mode jumps to the TSS, causing a task switch and loading 
all of the registers with the values stored in the TSS. Because a task switch saves the state of the 
current task in a task state segment, the Task State Segment register should be initialized to point 
to a valid TSS descriptor. 
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Figure 6-18. GDT Descriptors for Simple System 
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6.4 PAGING 


6.4.1. Paging Concepts 


Paging is another type of memory management useful for virtual memory multi-tasking operating 
systems. Unlike segmentation, which modularizes programs and data into variable length seg- 
ments, paging divides programs into multiple uniform size pages. Pages bear no direct relation to 
the logical structure of a program. Whereas segment selectors can be considered the logical 
“name” of a program module or data structure, a page most likely corresponds to only a portion 
of a module or data structure. 


By taking advantage of the locality of reference displayed by most programs, only a small number 
of pages from each active task need be in memory at any moment. 


6.4.2 Paging Organization 


6.4.2.1 Page Mechanism 


The Intel486 processor uses two levels of tables to translate the linear address (from the segmen- 
tation unit) to a physical address. There are three components to the paging mechanism of the 
Intel486 processor: the page directory, the page tables, and the page itself (page frame). All mem- 
ory-resident elements of the Intel486 processor paging mechanism are 4 Kbytes. A uniform size 
for all of the elements simplifies memory allocation and reallocation schemes by eliminating 
problems with memory fragmentation. Figure 6-19 shows how the paging mechanism works. 


6.4.2.2 Page Descriptor Base Register 


CR2 is the Page Fault Linear Address register. It holds the 32-bit linear address that caused the 
last page fault detected. 


CR3 is the Page Directory Physical Base Address register. It contains the physical starting ad- 
dress of the page directory. The lower 12 bits of CR3 are always zero to ensure that the page di- 
rectory is always page aligned. Loading it via a MOV CR3 reg instruction causes the page table 
entry cache to be flushed, as does a task switch through a TSS that changes the value of CRO (see 
Section 6.4.5, “Translation Lookaside Buffer’). 


6.4.2.3 Page Directory 


The Page Directory is 4 Kbytes long and allows up to 1024 page directory entries. Each page di- 
rectory entry contains the address of the next level of tables, the Page Tables and information 
about the page table. The contents of a page directory entry are shown in Figure 6-20. The upper 
10 bits of the linear address (A[31:22]) are used as an index to select the correct page directory 
entry. 
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6.4.2.4 Page Tables 


Each Page Table is 4 Kbytes and holds up to 1024 page table entries. Page table entries contain 
the starting address of the page frame and statistical information about the page (see Figure 6-21). 
Address bits A[21:12] are used as an index to select one of the 1024 page table entries. The 20 
upper-bit page frame address is concatenated with the lower 12 bits of the linear address to form 
the physical address. Page tables can be shared between tasks and swapped to disks. 
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Figure 6-19. Paging Mechanism 


Figure 6-20. Page Directory Entry (Points to Page Table) 
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Figure 6-21. Page Table Entry (Points to Page) 
6.4.2.5 Page Directory/Table Entries 


The lower 12 bits of the page table entries and page directory entries contain statistical informa- 
tion about pages and page tables, respectively. The P (Present) bit 0 indicates whether a page di- 
rectory or page table entry can be used in address translation. If P = 1 the entry can be used for 
address translation. If P = 0 the entry cannot be used for translation, and all other bits are available 
for use by the software. For example the remaining 31 bits could be used to indicate where on the 
disk the page is stored. 


Bit 5, the Accessed (A) bit, is set by the Intel486 processor for both types of entries before a read 
or write access occurs to an address covered by the entry. Bit 6, the D (Dirty) bit, is set to 1 before 
a write to an address covered by that page table entry occurs. The D bit is undefined for page di- 
rectory entries. When the P, A and D bits are updated by the Intel486 processor, a read-modify- 
write cycle is generated that locks the bus and prevents conflicts with other processors or periph- 
erals. Software that modifies these bits should use the LOCK prefix to ensure the integrity of the 
page tables in multi-master systems. 


The three bits marked OS Reserved in Figures 6-20 and 6-21 (bits 11:9) are software-definable. 
OSs are free to use these bits for any purpose. An example of the use of the OS Reserved bits is 
storing information about page aging. By keeping track of how long a page has been in memory 
since being accessed, an operating system can implement a page replacement algorithm such as 
least recently used. 


Bit 2, the User/Supervisor (U/S) bit, and bit 1, the Read/Write (R/W) bit, are used to provide pro- 
tection attributes for individual pages. 
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6.4.3 Page Level Protection (R/W, U/S Bits) 


The Intel486 processor provides a set of protection attributes for paging systems. The paging 
mechanism distinguishes between two levels of protection: user, which corresponds to level 3 of 
the segmentation based protection; and supervisor, which encompasses all of the other protection 
levels (0, 1, 2). 


The R/W and U/S bits are used in conjunction with the WP bit in the flags register (EFLAGS). 
The Intel386 processor does not contain the WP bit. The WP bit has been added to the Intel486 
processor to protect read-only pages from supervisor write accesses. The Intel386 processor al- 
lows a read-only page to be written from protection levels 0, 1, or 2. WP=0 is the Intel386 pro- 
cessor compatible mode. When WP=0, the supervisor can write to a read-only page as defined by 
the U/S and R/W bits. When WP=1, supervisor access to a read-only page (R/W=0) causes a page 
fault (exception 14). 


Table 6-4 shows the affect of the WP, U/S and R/W bits on accessing memory. When WP=0, the 
supervisor can write to pages regardless of the state of the R/W bit. When WP=1 and R/W=0, the 
supervisor cannot write to a read-only page. A user attempt to access a supervisor-only page 
(U/S=0) or to write to a read-only page causes a page fault (exception 14). 


Table 6-4. Page Level Protection Attributes 


U/S R/W WP User Access Supervisor Access 

0 0 0 None Read/Write/Execute 
0 1 0 None Read/Write/Execute 
1 0 0 Read/Execute Read/Write/Execute 
1 1 0 Read/Write/Execute Read/Write/Execute 
0 0 1 None Read/Execute 

0 1 1 None Read/Write/Execute 
1 0 1 Read/Execute Read/Execute 

1 1 1 Read/Write/Execute Read/Write/Execute 


The R/W and U/S bits provide protection from user access on a page-by-page basis because the 
bits are contained in the page table entry and the page directory table. The U/S and R/W bits in 
the first-level page directory table apply to all entries in the page table pointed to by that directory 
entry. The U/S and R/W bits in the second-level page table entry apply only to the page described 
by that entry. The most restrictive U/S and R/W bits from the page directory table and the page 
table entry are used to address a page. 


Example: If the U/S and R/W bits for the page directory entry were 10 (user read/execute) and 
the U/S and R/W bits for the page table entry were 01 (no user access at all), the access rights for 
the page would be 01, the numerically smaller of the two. 
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NOTE 


Note that a given segment can be easily made read-only for level 0, 1, or 2 via 
use of segmented protection mechanisms. 


6.4.4 Page Cacheability (PWT and PCD Bits) 


See Section 7.6, “Page Cacheability,” for a detailed description of page cacheability and the PWT 
and PCD bits. 


6.4.5 Translation Lookaside Buffer 


The Intel486 processor paging hardware is designed to support demand paged virtual memory 
systems. However, performance would degrade substantially if the Intel486 processor were re- 
quired to access two levels of tables for every memory reference. To solve this problem, the 
Intel486 processor keeps a cache of the most recently accessed pages. This cache is called the 
Translation Lookaside Buffer (TLB). The TLB is a four-way set associative 32-entry page table 
cache. It automatically keeps the most commonly used page table entries in the Intel486 proces- 
sor. The 32-entry TLB coupled with a 4 Kbyte page size, results in coverage of 128 Kbytes of 
memory addresses. For many common multi-tasking systems, the TLB has a hit rate of about 
98%. This means that the Intel486 processor must access the two-level page structure only on 2% 
of all memory references. Figure 6-22 illustrates how the TLB complements the Intel486 proces- 
sor's paging mechanism. 
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Figure 6-22. Translation Lookaside Buffer 
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Reading a new entry into the TLB (TLB refresh) is a two step process handled by the Intel486 
processor hardware. The sequence of data cycles to perform a TLB refresh is as follows: 


1. Read the correct page directory entry, as pointed to by the page base register and the upper 
10 bits of the linear address. The page base register is in Control Register 3. 


Optionally, perform a locked read/write to set the accessed bit in the directory entry. The 
directory entry is read twice if the Intel486 processor needs to set any of the bits in the 
entry. If the page directory entry changes between the first and second reads, the data 
returned for the second read is used. 


2. Read the correct entry in the Page Table and place the entry in the TLB. 


Optionally, perform a locked read/write to set the accessed and/or dirty bit in the page 
table entry. Again, note that the page table entry actually is read twice if the Intel486 
processor needs to set any of the bits in the entry. Like the directory entry, if the data 
changes between the first and second read, the data returned for the second read is used. 


Note that the directory entry must always be read into the Intel486 processor, because directory 
entries are never placed in the paging TLB. Page faults can be signaled from either the page di- 
rectory read or the page table read. Page directory and page table entries can be placed in the 
Intel486 processor on-chip cache like normal data. 


6.4.6 Paging Operation 


The paging hardware operates in the following fashion. The paging unit hardware receives a 32- 
bit linear address from the segmentation unit. The upper 20 linear address bits are compared with 
all 32 entries in the TLB to determine if there is a match. If there is a match (i.e., a TLB hit), then 
the 32-bit physical address is calculated and is placed on the address bus. 


If the page table entry is not in the TLB, the Intel486 processor reads the appropriate page direc- 
tory entry. When P = | on the page directory entry, indicating that the page table is in memory, 
then the Intel486 processor reads the appropriate page table entry and sets the Access bit. When 
P= 1 on the page table entry, indicating that the page is in memory, the Intel486 processor up- 
dates the Access and Dirty bits as needed and fetches the operand. The upper 20 bits of the linear 
address, read from the page table, are stored in the TLB for future accesses. However, if P = 0 for 
either the page directory entry or the page table entry, the Intel486 processor generates a page 
fault, exception 14. 


The Intel486 processor also generates an exception 14 page fault if the memory reference violated 
the page protection attributes such as U/S or R/W (for example, when trying to write to a read- 
only page). CR2 holds the linear address that caused the page fault. If a second page fault occurs 
while the Intel486 processor is attempting to enter the service routine for the first, the Intel486 
processor invokes the page fault handler a second time, rather than the double fault (exception 8) 
handler. Because exception 14 is classified as a fault, CS: EIP points to the instruction causing 
the page fault. The 16-bit error code pushed as part of the page fault handler contains status bits 
that indicate the cause of the page fault. 


The 16-bit error code is used by the operating system to determine how to handle the page fault. 
The upper portion of Figure 6-23 shows the format of the page-fault error code and the interpre- 
tation of the bits. 
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« U/S: The U/S bit indicates whether the access causing 
the fault occurred when the Intel486 processor was 
executing in User Mode (U/S = 1) or in Supervisor 
mode (U/S = 0). 


« WIR: The WIR bit indicates whether the access causing 
the fault was a Read (W/R = 0) or a Write (W/R = 1). 


* P: The P bit indicates whether a page fault was caused 
by a not-present page (P = 0), or by a page level 
protection violation (P = 1). 


Figure 6-23. Page Fault System Information 


NOTE 


Even though the bits in the error code (U/S, W/R, and P) have similar names as 
the bits in the Page Directory/Table Entries, the interpretation of the error code 
bits is different. Figure 6-23 indicates what type of access caused the page 
fault. 
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6.4.7 Operating System Responsibilities 


The Intel486 processor takes care of the page address translation process, relieving the burden 
from an operating system in a demand-paged system. The operating system is responsible for set- 
ting up the initial page tables, and handling any page faults. The operating system also is required 
to invalidate (i.e., flush) the TLB when any changes are made to any of the page table entries. The 
operating system must reload CR3 to cause the TLB to be flushed. 


Setting up the tables requires loading CR3 with the address of the page directory, and allocating 
space for the page directory and the page tables. The primary responsibilities of the operating sys- 
tem are to implement a swapping policy and handle all of the page faults. 


The operating system must ensure that the TLB cache matches the information in the paging ta- 
bles. In particular, when the operating system sets the P bit of page table entry to zero, the TLB 
must be flushed. Operating systems may want to take advantage of the fact that CR3 is stored as 
part of a TSS, to give every task or group of tasks its own set of page tables. 


6.5 VIRTUAL 8086 ENVIRONMENT 


6.5.1 Executing 8086 Programs 


The Intel486 processor allows the execution of 8086 application programs in both Real Mode and 
in the Virtual 8086 Mode (Virtual Mode). Of the two methods, Virtual 8086 Mode offers the sys- 
tem designer the most flexibility. The Virtual 8086 Mode allows the execution of 8086 applica- 
tions while still allowing the system designer to take full advantage of the Intel486 processor 
protection mechanism. In particular, the Intel486 processor allows the simultaneous execution of 
an 8086 operating system (and its applications) and an Intel486 processor operating system run- 
ning both 80286 and Intel486 processor applications. Thus, on a multi-user Intel486 processor 
computer, one person could be running an MS-DOS* spreadsheet, another person using MS- 
DOS, and a third person could be running multiple UNIX utilities and applications. Each person 
in this scenario would believe that he had sole use of the computer. Figure 6-24 illustrates this 
concept. 


| 6-35 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


Physical Memory 


02000000(H) 


J Page Directory 
Virtual Mode Task 2 


8086 Task 


Ce 
8086 OS —Y 
ZA 00000000(H) 
Page Directory Task 1 Si ay 
Root Page Table as emory 


Virtual Mod Page Directory CL] Task 2 Memory 
Delo ertaae Task 1 7 a 
Bees Tes A \ntel486™ Processor OS Memory 


RQ 8086 OS Memory 


A5216-01 


Figure 6-24. Virtual 8086 Environment Memory Management 


6.5.2 Virtual 8086 Mode Addressing Mechanism 


One of the major differences between Intel486 processor Real and Protected Modes is how the 
segment selectors are interpreted. When the Intel486 processor is executing in Virtual 8086 
Mode, the segment registers are used in an identical fashion to Real Mode. The contents of the 
segment register are shifted left four bits and added to the offset to form the segment base linear 
address. 


The Intel486 processor allows the operating system to specify which programs use the 8086 style 
address mechanism, and which programs use Protected Mode addressing. Through the use of 
paging, the one megabyte address space of the Virtual Mode task can be mapped to anywhere in 
the 4-Gbyte linear address space of the Intel486 processor. Like Real Mode, Virtual Mode effec- 
tive addresses (i.e., segment offsets) that exceed 64 Kbyte cause an exception 13. However, these 
restrictions should not prove to be important, because most tasks running in Virtual 8086 Mode 
are existing 8086 application programs. 
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6.5.3 Paging in Virtual Mode 


The paging hardware allows the concurrent running of multiple Virtual Mode tasks, and provides 
protection and operating system isolation. Although it is not strictly necessary to have the paging 
hardware enabled to run Virtual Mode tasks, it is needed in order to run multiple Virtual Mode 
tasks or to relocate the address space of a Virtual Mode task to physical address space greater than 
one Mbyte. 


The paging hardware allows the 20-bit linear address produced by a Virtual Mode program to be 
divided into up to 256 pages. Each one of the pages can be located anywhere within the maximum 
4-Gbyte physical address space of the Intel486 processor. In addition, because CR3 (the Page Di- 
rectory Base Register) is loaded by a task switch, each Virtual Mode task can use a different map- 
ping scheme to map pages to different physical locations. Finally, the paging hardware allows the 
sharing of the 8086 operating system code between multiple 8086 applications. Figure 6-24 
shows how the Intel486 processor paging hardware enables multiple 8086 programs to run under 
a virtual memory demand paged system. 


6.5.4 Protection and I/O Permission Bitmap 


All Virtual 8086 Mode programs execute at privilege level 3, the level of least privilege. As such, 
Virtual 8086 Mode programs are subject to all of the protection checks defined in Protected 
Mode. (This is different from Real Mode, which implicitly is executing at privilege level 0, the 
level of greatest privilege.) Thus, an attempt to execute a privileged instruction when in Virtual 
8086 Mode causes an exception 13 fault. 


The following are privileged instructions that can be executed only at Privilege Level 0. There- 
fore, attempting to execute these instructions in Virtual 8086 Mode (or anytime CPL > 0) causes 
an exception 13 fault: 


LIDT; MOV DRn, reg; MOV reg,DRn; 
LGDT; MOV TRn, reg; MOV reg, TRn; 
LMSW; MOV CRn,reg; MOV reg,CRn. 
CLTS; 
HLT; 


Several instructions, particularly those applying to the multi-tasking model and protection model, 
are available only in Protected Mode. Therefore, attempting to execute the following instructions 
in Real Mode or in Virtual 8086 Mode generates an exception 6 fault: 


LTR; STR; 
LLDT; SLDT; 
LAR; VERR; 
LSL;  VERW; 
ARPL. 
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The instructions that are IOPL-sensitive in Protected Mode are: 


IN; STI; 
OUT; CLI 
INS; 

OUTS; 

REP INS; 
REP OUTS; 


In Virtual 8086 Mode, a slightly different set of instructions are made IOPL-sensitive. The fol- 
lowing instructions are IOPL-sensitive in Virtual 8086 Mode: 


INT n; STI; 
PUSHF; CLI; 
POPF; IRET 


The PUSHF, POPF, and IRET instructions are IOPL-sensitive in Virtual 8086 Mode only. This 
provision allows the IF flag (interrupt enable flag) to be virtualized to the Virtual 8086 Mode pro- 
gram. The INT n software interrupt instruction is also IOPL-sensitive in Virtual 8086 Mode. 
Note, however, that the INT 3 (opcode 0CCH), INTO, and BOUND instructions are not IOPL- 
sensitive in Virtual 8086 Mode (they are not IOPL sensitive in Protected Mode either). 


Note that the I/O instructions (IN, OUT, INS, OUTS, REP INS, and REP OUTS) are not IOPL- 
sensitive in Virtual 8086 Mode. Rather, the I/O instructions become automatically sensitive to 
the I/O permission bitmap contained in the Intel486 processor Task State Segment. The I/O per- 
mission bitmap, automatically used by the Intel486 processor in Virtual 8086 Mode, is illustrated 
by Figures 6-14 and 6-15. 


The I/O Permission Bitmap can be viewed as a 0-64 Kbit string, which begins in memory at off- 
set Bit_Map_Offset in the current TSS. Bit_Map_Offset must be < DFFFH so the entire bit map 
and the byte FFH that follows the bit map are all at offsets < FFFFH from the TSS base. The 16- 
bit pointer Bit_Map_Offset (15:0) is found in the word beginning at offset 66H (102 decimal) 
from the TSS base, as shown in Figure 6-14. 


Each bit in the I/O permission bitmap corresponds to a single byte-wide I/O port, as illustrated in 
Figure 6-14. If a bit is 0, I/O to the corresponding byte-wide port can occur without generating 
an exception. Otherwise the I/O instruction causes an exception 13 fault. Because every byte- 
wide I/O port must be protectable, all bits corresponding to a word-wide or dword-wide port must 
be 0 for the word-wide or dword-wide I/O to be permitted. If all the referenced bits are 0, the I/O 
is allowed. If any referenced bits are 1, the attempted I/O causes an exception 13 fault. 


Due to the use of a pointer to the base of the I/O permission bitmap, the bitmap may be located 
anywhere within the TSS, or may be ignored completely by pointing the Bit_Map_Offset (15:0) 
beyond the limit of the TSS segment. In the same manner, by adjusting the TSS limit to truncate 
the bitmap, only a small portion of the 64 Kbyte I/O space need have an associated map bit. This 
eliminates the commitment of 8 Kbyte of memory when a complete bitmap is not required. 
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Example of Bitmap for I/O Ports 0-255: Setting the TSS limit to {bit_Map_Offset + 31 + 1} 
(see note below) allows a 32-byte bitmap for the I/O ports 0-255, plus a terminator byte of all 
ones (see note below). This allows the I/O bitmap to control I/O permission to I/O port 0-255, 
but causes an exception 13 fault on attempted I/O to any I/O port 80256 through 65,565. 


NOTE 
Beyond the last byte of I/O mapping information in the I/O permission bitmap, 
there must be a byte containing all ones. The byte of all ones must be within 
the limit of the Intel486 processor TSS segment (see Figure 6-14). 


6.5.5 Interrupt Handling 


In order to fully support the emulation of an 8086 machine, interrupts in Virtual 8086 Mode are 
handled in a unique way. When running in Virtual Mode, all interrupts and exceptions involve a 
privilege change back to the host Intel486 processor operating system. The Intel486 processor 
operating system determines if the interrupt comes from a protected mode application or from a 
Virtual Mode program by examining the VM bit in the EFLAGS image stored on the stack. 


When a Virtual Mode program is interrupted and execution passes to the interrupt routine at level 
0, the VM bit is cleared. However, the VM bit is still set in the EFLAG image on the stack. 


The Intel486 processor operating system in turn handles the exception or interrupt and then re- 
turns control to the 8086 program. The Intel486 processor operating system may choose to let the 
8086 operating system handle the interrupt or it may emulate the function of the interrupt handler. 
For example, many 8086 operating system calls are accessed by PUSHing parameters on the 
stack, and then executing an INT n instruction. If the IOPL is set to 0, then all INT n instructions 
are intercepted by the Intel486 processor operating system. The Intel486 processor operating sys- 
tem could emulate the 8086 operating system's call. Figure 6-25 shows how the Intel486 proces- 
sor operating system could intercept an 8086 operating system's call to “Open a File.” 


An Intel486 processor operating system can provide a Virtual 8086 environment that is totally 
transparent to the application software by intercepting and then emulating 8086 operating sys- 
tem's calls, and intercepting IN and OUT instructions. 
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Figure 6-25. Virtual 8086 Environment Interrupt and Call Handling 


6.5.6 Entering and Leaving Virtual 8086 Mode 


An Intel486 processor is executing in Protected Mode can be switched to Virtual 8086 Mode by 
executing an IRET instruction (at CPL=0), or task switch (at any CPL) to an Intel486 processor 
task whose TSS has a FLAGS image containing a | in the VM bit position. That is, one way to 
enter Virtual 8086 Mode is to switch to a task with an Intel486 processor TSS that has a | in the 
VM bit in the EFLAGS image. The other way is to execute a 32-bit IRET instruction at privilege 
level 0, where the stack has a | in the VM bit in the EFLAGS image. POPF does not affect the 
VM bit, even if the Intel486 processor is in Protected Mode or level 0, and so cannot be used to 
enter Virtual 8086 Mode. PUSHF always pushes a 0 in the VM bit, even if the Intel486 processor 
is in Virtual 8086 Mode, so that a program cannot tell whether it is executing in Real Mode or in 


Virtual 8086 Mode. 
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The VM bit can be set by executing an IRET instruction only at privilege level 0, or by any in- 
struction or interrupt that causes a task switch in Protected Mode (with VM=1 in the new FLAGS 
image). The UM bit can be cleared only by an interrupt or exception in Virtual 8086 Mode. IRET 
and POPF instructions executed in Real Mode or Virtual 8086 Mode do not change the value in 
the VM bit. 


The transition out of Virtual 8086 Mode to Protected Mode occurs only on receipt of an interrupt 
or exception (such as due to a sensitive instruction). In Virtual 8086 Mode, all interrupts and ex- 
ceptions vector through the Protected Mode IDT, and enter an interrupt handler in protected 
Intel486 processor mode. That is, as part of interrupt processing, the VM bit is cleared. 


Because the matching IRET must occur from level 0, if an interrupt or trap gate is used to field 
an interrupt or exception out of Virtual 8086 Mode, the Gate must perform an inter-level interrupt 
only to level 0. Interrupt or trap gates through conforming segments, or through segments with 
DPL>O, raise a GP fault with the CS selector as the error code. 


6.5.6.1 Task Switches to and from Virtual 8086 Mode 


Tasks that can execute in Virtual 8086 Mode must be described by a TSS with the new Intel486 
processor format (TYPE 9 or 11 descriptor). 


A task switch out of Virtual 8086 Mode operates exactly the same as any other task switch out of 
a task with an Intel486 processor TSS. The programmer visible state, including the FLAGS reg- 
ister with the VM bit set to 1, is stored in the TSS. 


The segment registers in the TSS contain 8086 segment base values rather than selectors. 


A task switch into a task described by an Intel486 processor TSS has an additional check to de- 
termine if the incoming task should be resumed in Virtual 8086 Mode. Tasks described by 80286 
format TSSs cannot be resumed in Virtual 8086 Mode, so no check is required (the FLAGS image 
in 80286 format TSS has only the low order 16 FLAGS bits). Before loading the segment register 
images from an Intel486 processor TSS, the FLAGS image is loaded, so that the segment regis- 
ters are loaded from the TSS image as 8086 segment base values. The task is now ready to resume 
in Virtual 8086 Mode. 


6.5.6.2 Transitions Through Trap and Interrupt Gates, and IRET 


A task switch is one way to enter or exit Virtual 8086 Mode. The other method is to exit through 
a trap or interrupt gate as part of handling an interrupt, and to enter as part of executing an IRET 
instruction. The transition out must use an Intel486 processor trap gate (Type 14) or Intel486 pro- 
cessor interrupt gate (Type 15), which must point to a non-conforming level 0 segment (DPL=0) 
in order to permit the trap handler to IRET back to the Virtual 8086 program. The gate must point 
to a non-conforming level 0 segment to perform a level switch to level 0 so the matching IRET 
can change the VM bit. Intel486 processor gates must be used, because 80286 gates save only the 
low 16 bits of the FLAGS register, so that the VM bit is not saved on transitions through the 
80286 gates. Also, the 16-bit IRET (presumably) used to terminate the 80286 interrupt handler 
pops only the lower 16 bits from FLAGS, and does not affect the VM bit. The action taken for an 
Intel486 processor trap or interrupt gate if an interrupt occurs while the task is executing in Vir- 
tual 8086 Mode is given by the following sequence: 
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1. Save the FLAGS register in a temp to push later. Turn off the VM and TF bits and, if the 
interrupt is serviced by an Interrupt Gate, turn off the IF bit also. 


2. Interrupt and trap gates must perform a level switch from level 3 (where the VM86 
program executes) to level 0 (so IRET can return). This process involves a stack switch to 
the stack given in the TSS for privilege level 0. Save the Virtual 8086 Mode SS and ESP 
registers to push in a later step. The segment register load of SS is done as a Protected 
Mode segment load, because the VM bit was turned off in step 1. 


3. Push the 8086 segment register values onto the new stack, in the order: GS, FS, DS, ES. 
These are pushed as 32-bit quantities, with undefined values in the upper 16 bits. Then, 
load these four registers with null selectors (0). 


4. Push the old 8086 stack pointer onto the new stack by pushing the SS register (as 32-bits, 
high bits undefined), then pushing the 32-bit ESP register saved above. 


5. Push the 32-bit FLAGS register saved in step 1. 


6. Push the old 8086 instruction pointer onto the new stack by pushing the CS register (as 32- 
bits, high bits undefined), then pushing the 32-bit EIP register. 


7. Load the new CS:EIP value from the interrupt gate, and begin execution of the interrupt 
routine in Protected Mode. 


The transition out of Virtual 8086 Mode performs a level change and stack switch, in addition to 
changing back to Protected Mode. In addition, all of the 8086 segment register images are stored 
on the stack (behind the SS:ESP image), and then loaded with null (0) selectors before entering 
the interrupt handler. This permits the handler to safely save and restore the DS, ES, FS, and GS 
registers as 80286 selectors. This is needed so that interrupt handlers that do not care about the 
mode of the interrupted program can use the same prolog and epilog code for state saving (i.e., 
push all registers in prolog, pop all in epilog), regardless of whether or not a “native” mode or 
Virtual 8086 Mode program was interrupted. Restoring null selectors to these registers before ex- 
ecuting the IRET instruction does not cause a trap in the interrupt handler. Interrupt routines that 
obtain values from the segment registers or return values to segment registers have to obtain/re- 
turn them from the 8086 register images pushed onto the new stack. They need to know the mode 
of the interrupted program in order to know where to find/return segment registers, and also to 
know how to interpret segment register values. 
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The IRET instruction performs the inverse of the above sequence. Only the extended Intel486 
processor IRET instruction (operand size=32) can be used, and must be executed at level 0 to 
change the VM bit to 1. 


1. 


If the NT bit in the FLAGs register is on, an inter-task return is performed. The current 
state is stored in the current TSS, and the link field in the current TSS is used to locate the 
TSS for the interrupted task that is to be resumed. Otherwise, continue with the following 
sequence. 


Read the FLAGS image from SS:8[ESP] into the FLAGS register. This sets VM to the 
value active in the interrupted routine. 


Pop off the instruction pointer CS:EIP. EIP is popped first, then a 32-bit word is popped 
that contains the CS value in the lower 16 bits. If VM=0, this CS load is done as a 
Protected Mode segment load; if VM=1, this is done as an 8086 segment load. 


Increment the ESP register by four to bypass the FLAGS image that was “popped” in 
step 1. 


If VM=1, load segment registers ES, DS, FS, and GS from memory locations SS:[ESP+8], 
SS:[ESP+12], SS:[ESP+16], and SS:[ESP+20], respectively, where the new value of ESP 
stored in step 4 is used. When VM=1, these are done as 8086 segment register loads; when 
VM=0, check that the selectors in ES, DS, FS, and GS are valid in the interrupted routine. 
Null out invalid selectors to trap if an attempt is made to access through them. 


If RPL(CS) > CPL, pop the stack pointer SS:ESP from the stack. The ESP register is 
popped first, followed by 32-bits containing SS in the lower 16 bits. If VM=0, SS is 
loaded as a Protected Mode segment register load. If VM=1, an 8086 segment register 
load is used. 


Resume execution of the interrupted routine. The VM bit in the FLAGS register (restored 
from the interrupt routine's stack image in step 1) determines whether the Intel486 
processor resumes the interrupted routine in Protected Mode or Virtual 8086 Mode. 
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CHAPTER 7 
ON-CHIP CACHE 


All members of the Intel486™ processor family, except the IntelIDX4™ processor, contain an on- 
chip 8-Kbyte cache. The IntelDX4 processor has a 16-Kbyte cache, as discussed in Section 7.1.1. 
The cache is software-transparent to maintain binary compatibility with previous generations of 
the Intel Architecture. 


The on-chip cache is designed for maximum flexibility and performance. The cache has several 
operating modes, offering flexibility during program execution and debugging. Memory areas 
can be defined as non-cacheable by software and external hardware. Protocols for cache line in- 
validations and cache replacement are implemented in hardware, easing system design. 


7.1. CACHE ORGANIZATION 


The on-chip cache is a unified code and data cache; that is, the cache is used for both instruction 
and data accesses and acts on physical addresses. 


The cache organization is 4-way set associative and each line is 16 bytes wide. The 8 Kbytes of 
cache memory are logically organized as 128 sets, each containing four lines. 


The cache memory is physically split into four 2-Kbyte blocks, each containing 128 lines (see 
Figure 7-1). There are 128 21-bit tags associated with each 2-Kbyte block. There is a valid bit for 
each line in the cache. Each line in the cache is either valid or not valid; there are no provisions 
for partially valid lines. 
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Figure 7-1. On-Chip Cache Physical Organization 
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For all Intel486 processors except the Write-Back Enhanced IntelDX4 processor, the on-chip 
cache is write-through only. All writes drive an external write bus cycle in addition to writing the 
information to the internal cache when the write is a cache hit. A write to an address not contained 
in the internal cache is written to external memory only. Cache allocations are not made on write 
misses. 


The Write-Back Enhanced IntelDX4 processor supports two modes of operation with respect to 
internal cache configurations: Standard Bus Mode (write-through cache) and Enhanced Bus 
Mode (write-back cache). Standard Bus Mode operation for the Write-Back Enhanced IntelDX4 
processor is the same as the write-through cache for other Intel486 processors (see Section 7.1.2, 
“Write-Back Enhanced IntelDX4™ Processor Cache” and other write-back enhanced sections 
below for write-back cache information). 
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7.1.1. IntelDX4™ Processor On-Chip Cache 


The IntelDX4 processor contains a 16-Kbyte write-through cache. The 16 Kbytes of cache mem- 
ory are logically organized as 256 sets, each containing four lines. 


The cache memory is physically split into four 4-Kbyte blocks, each containing 256 lines (see 
Figure 7-1). There are 256 20-bit tags associated with each 2-Kbyte block. 


All other details listed in Section 7.1 for the 8-Kbyte on-chip cache also apply to the IntelDX4 
on-chip cache. 


7.1.2. Write-Back Enhanced IntelDX4™ Processor Cache 


The Write-Back Enhanced IntelDX4 processor implements a unified cache, with a total cache 
size of 16 Kbytes. The processor’s on-chip cache supports a modified MESI (modified / exclusive 
/ shared / invalid) write-back cache consistency protocol. 


The Write-Back Enhanced IntelDX4 processor’s internal cache is configurable as write-back or 
write-through on a line-by-line basis, provided the cache is enabled for write-back operation. The 
cache is enabled for write-back operation by driving the WB/WT# pin to a high state for at least 
two clocks before and two clocks after the falling edge of RESET. Cache write-back and invali- 
dations can be initiated by hardware or software. Protocols for cache consistency and line re- 
placement are implemented in hardware to ease system design. 


Once the cache configuration is selected, the Write-Back Enhanced IntelDX4 processor contin- 
ues to operate in the selected configuration and can be changed to a different configuration only 
by starting the RESET process again. Asserting SRESET does not change the operating mode of 
the processor. WB/WT# has an internal pull down; when WB/WT# is unconnected, the processor 
is in Standard Bus Mode, i.e., the on-chip cache is write-through. Table 7-1 lists the two modes 
of operation and the differences between the two modes. 


Unless specifically noted, the following sections apply to the Write-Back Enhanced IntelDX4 
processor in Standard Bus Mode (write-through cache) and all other Intel486 processors. 
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Table 7-1. Write-Back Enhanced IntelDX4™ 
Processor WB/WT# Initialization 


State of WB/WT# 
at Falling Edge Effect on Write-Back Enhanced IntelDX4 Processor Operation 
of RESET 


WB/WT# = LOW | Processor is in Standard Bus Mode (write-through cache) 
1. When FLUSH# is asserted, the internal cache is invalidated in one system CLK. 
2. No Special FLUSH# acknowledge cycles appear on the bus after the assertion of 
FLUSH#. 
3. All write-back specific inputs are ignored (INV, WB/WT#). 
4. SRESET does not clear the SMBASE register. It behaves much like a RESET 


(invalidating the on-chip cache and resetting the CRO register, for example). SRESET is 
not an interrupt. 


WB/WT# = HIGH | Processor is in Enhanced Bus Mode (Write-Back Cache) 


1. Write backs are performed when a cache flush is requested (via the FLUSH# pin or 
the WBINVD instruction). The system must watch for the FLUSH# special cycles to 
determine the end of the flush. 

2. The special FLUSH# acknowledge cycles appear on the bus after the assertion of 
the FLUSH# and after all the cache write backs (if any) are completed on the bus. 

3. WB/WT# is sampled on a line-by-line basis to determine the state of a line to be 
allocated in the cache (as a write through (S state) or as write back (E state)). 

4. The WB/WT# and INV inputs are no longer ignored. HITM# and CACHE# are driven 
during appropriate bus cycles. 

5. PLOCK# is always driven inactive. 

6. SRESET is an interrupt. SRESET does not reset the SMBASE register or flush the 
on-chip cache. The CRO register gets the same values as after RESET, with the 
exception of the CD and NW bits. These two bits retain their previous status. See 
Section 9.2.18.4, “Soft Reset (SRESET)” and Table 7-7 for details on SRESET for 
enhanced bus (write-back) mode. 


Table 7-2. Cache Operating Modes 


cD NW Operating Mode 
1 1 Cache fills disabled, write-through and invalidates disabled. 
1 0 Cache fills disabled, write-through and invalidates enabled. 
0 1 INVALID. When CRO is loaded with this configuration of bits, a GP fault with error code of 
0 is raised. 
0 0 Cache fills enabled, write-through and invalidates enabled. 
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7.2 CACHE CONTROL 


Control of the cache is provided by the CD and NW bits in CRO. CD enables and disables the 
cache. NW controls memory write-throughs and invalidates. 


The CD and NW bits define four operating modes of the on-chip cache, as given in Table 7-2. 
These modes provide flexibility in how the on-chip cache is used. 


CD=1, NW=1 
The cache is completely disabled by setting CD=1 and NW=1 and then flushing the cache. 


This mode may be useful for debugging programs in which it is important to see all 
memory cycles at the pins. Writes that hit in the cache do not appear on the external bus. 


It is possible to use the on-chip cache as fast static RAM by “pre-loading” certain memory 
areas into the cache and then setting CD=1 and NW=1. Pre-loading can be done by careful 
choice of memory references with the cache turned on or by using of the testability 
functions (see Section B.2, “On-Chip Cache Testing’). When the cache is turned off, the 
memory mapped by the cache is “frozen” into the cache because fills and invalidates are 
disabled. 


CD=1, NW=0 
Cache fills are disabled but write-throughs and invalidates are enabled. This mode is the 
same as if the KEN# pin was strapped high, disabling cache fills. Write-throughs and 
invalidates still may occur to keep the cache valid. This mode is useful when the software 
must disable the cache for a short period of time, and then re-enable it without flushing the 
original contents. 

CD=0, NW=1 
Invalid. When CRO is loaded with this bit configuration, a General Protection fault with an 
error code of 0 occurs. 

CD=0, NW=0 
This is the normal operating mode. 

Completely disabling the cache is a two-step process. First, CD and NW must be set to 1, and 


then the cache must be flushed. When the cache is not flushed, cache hits on reads still occur and 
data is read from the cache. 
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7.2.1. Write-Back Enhanced IntelDX4™ Processor Cache Control and Operating 
Modes 


The Write-Back Enhanced IntelDX4 processor retains the use of CRO.CD and CRO.NW when the 
1,1 state forces a cache-off condition after RESET and the 0,0 state is the normal run state. Table 
7-3 defines these control bits when the cache is enabled for write-back operation. The values in 
Table 7-3 are also valid when the cache is in write-back mode and some lines are in a write- 
through state. 


CD=1, NW=1 
The 1,1 state is best used when no lines are allocated, which occurs naturally after RESET 
(but not SRESET), but must be forced (e.g., by the WBINVD instruction) when entered 
during normal operation. In these cases, the Write-Back Enhanced IntelDX4 processor 
operates as if it had no cache at all. 


When the 1,1 state is exited, lines that are allocated as write-back are written back upon a 
snoop hit or replacement cycle. Lines that were allocated as write-through (and later 
modified while in the 1,1 state) never appear on the bus. 


CD=1, NW=0 
The only difference between this state and the normal 0,0 “run” state is that new line fills 
(and the line replacements that result from capacity limitations) do not occur. This causes 
the contents of the cache to be locked in, unless lines are invalidated using snoops. 


7.3. CACHE LINE FILLS 


Any area of memory can be cached in the Intel486 processor. Non-cacheable portions of memory 
can be defined by the external system or by software. The external system can inform the Intel486 
processor that a memory address is non-cacheable by returning the KEN# pin inactive during a 
memory access. (Refer to Section 10.3.3, “Cacheable Cycles.”) Software can prevent certain pag- 
es from being cached by setting the PCD bit in the page table entry. 


A read request can be generated from program operation or by an instruction pre-fetch. The data 
is supplied from the on-chip cache when a cache hit occurs on the read address. When the address 
is not in the cache, a read request for the data is generated on the external bus. 


When the read request is to a cacheable portion of memory, the Intel486 processor initiates a 
cache line fill. During a line fill a 16-byte line is read into the Intel486 processor. Cache line fills 
are generated only for read misses. Write misses never cause a line in the internal cache to be al- 
located. When a cache hit occurs on a write, the line is updated. Cache line fills can be performed 
over 8- and 16-bit buses using the dynamic bus sizing feature. Refer to Section 10.1.2, “Dynamic 
Data Bus Sizing,” for a description of dynamic bus sizing and Section 10.3.3, “Cacheable Cy- 
cles,” for further information on cacheable cycles. 
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Table 7-3. Write-Back Enhanced IntelDX4™ Processor 


CRO, CD, NW Read Hit Read Miss WRITE HIT 1) Write Mess| Snoops 

1,1 read cache read bus write cache write bus not 
(state after reset) (no fill) (no write-through) accepted 
1,0 read cache read bus write cache, write bus if S| write bus normal 

(no fill) operation 
0,1 This is a fault-protected disallowed state. A GP(0) occurs when an attempt is made 

to load CRO with this state. 

0,0 read cache read bus, write cache, write bus if S| write bus normal 
(state DURING line fill operation 
normal operation) 


NOTE: Normal MESI state transitions occur on write hits in all legal states. 


7.4. CACHE LINE INVALIDATIONS 


The Intel486 processors contain both a hardware and software mechanism for invalidating inter- 
nal cache lines. Cache line invalidations are needed to keep the cache contents consistent with 
external memory. 


Refer to Section 10.3.8, “Invalidate Cycles,” for further information on cache line invalidations. 


7.4.1. Write-Back Enhanced IntelDX4™ Processor Snoop Cycles and Write- 


Back Mode Invalidation 


In Enhanced Bus Mode, the Write-Back Enhanced IntelDX4 processor performs invalidations 
differently than other Intel486 processors. Snoop cycles are initiated by the system to determine 
whether a line is present in the cache, and what the state is. Snoop cycles may be classified further 
as Inquire cycles or Invalidate cycles. When another bus master initiates a memory read cycle in- 
quire cycles are driven to the Write-Back Enhanced IntelDX4 processor to determine whether the 
processor cache contains the latest data. When the snooped line is in the Write-Back Enhanced 
IntelD X4 processor’s cache and the line contains the most recent information, the processor must 
schedule a write back of the data. Inquire cycles are driven with INV = ‘0’. Invalidate cycles are 
driven to the Write-Back Enhanced IntelDX4 processor when the other bus master initiates a 
memory write cycle to determine whether the Write-Back Enhanced IntelDX4 processor cache 
contains the snooped line. The invalidate cycles are driven with INV = ‘1’, so that when the 
snooped line is in the on-chip cache, the line is invalidated. Snoop cycles are described in detail 
in Section 10.3, “Bus Functional Description.” 


The Write-Back Enhanced IntelDX4 processor has control mechanisms (including snooping) for 
writing back the modified lines and invalidating the cache. There are special bus cycles associat- 
ed with write-backs and with invalidation. All of the Write-Back Enhanced IntelDX4 processor’ s 
special cycles require acknowledgment by RDY# or BRDY#. During the special cycles, the ad- 
dresses shown in the Table 7-4 are driven onto the address bus and the data bus is left undefined. 
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7.5 CACHE REPLACEMENT 


Before a line is placed in its internal cache, the Intel486 processor checks whether there is a non- 
valid line in the set; that line is replaced first. When all four lines in the set are valid, a pseudo 
least-recently-used mechanism is used to determine which line should be replaced. 


A valid bit is associated with each line in the cache. Before a line is placed in a set, the four valid 
bits are checked to see whether there is a non-valid line that can be replaced. When a non-valid 
line is found, that line is marked for replacement. 


The four lines in the set are labeled 10, 11, 12, and 13. The order in which the valid bits are checked 
during an invalidation is 10, 11, 12 and 13. All valid bits are cleared when the processor is reset or 
when the cache is flushed. 


Table 7-4. Encoding of the Special Cycles for Write-Back Cache 


Cycle Name M/lO# D/C# W/R# BE[3:0]# Al4:2] 
Write-Backt 0 0 1 0111 000 
First Flush Ack Cycle* 0 0 1 0111 001 
Flusht 0 0 1 1101 000 
Second Flush Ack Cycle* 0 0 1 1101 001 
Shutdown 0 0 1 1110 000 
HALT 0 0 1 1011 000 
Stop Grant Ack Cycle 0 0 1 1011 100 


*Write-Back Enhanced IntelDX4™ processor only. FLUSH is present on all Intel486™ processor, but differs for Standard 
Mode. 


Replacement in the cache is handled by a pseudo least recently used (LRU) mechanism when all 
four lines in a set are valid. Three bits, BO, B1 and B2, are defined for each of the 128 sets in the 
cache. These bits are called the LRU bits. The LRU bits are updated for every hit or replacement 
in the cache. 


If the most recent access to the set was to 10 or 11, BO is set to 1. If the most recent access was to 
12 or 13, BO is set to 0. If the most recent access to 11:10 was to 10, B1 is set to 1; otherwise, B1 
is set to 0. If the most recent access to 13:12 was to 12, B2 is set to 1; otherwise, B2 is set to 0. 


The pseudo LRU mechanism works in the following manner: When a line must be replaced, the 
cache first selects which of lines 11:10 and 13:12 was least recently used. Then the cache deter- 
mines which of the two lines was least recently used and mark it for replacement. This decision 
tree is shown in Figure 7-2. 
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All four lines No 
in the set valid? Replace 
non-valid line 
Yes 
BO = 0? 
Yes: 10 or 11 No: 12 or 13 
least recently used least recently used 
Bi =0? B2 = 0? 
Yes No Yes No 
Replace Replace Replace Replace 
10 ih l2 13 


Figure 7-2. On-Chip Cache Replacement Strategy 


7.6 PAGE CACHEABILITY 


Two bits for cache control, PWT and PCD, are defined in the page table and page directory en- 
tries. The states of these bits are driven out on the PWT and PCD pins during memory access cy- 
cles. 


The PWT bit controls the write policy for second-level caches used with the Intel486 processor. 
Setting PWT=1 defines a write-through policy for the current page while PWT=0 defines the pos- 
sibility of write-back. The state of PWT is ignored internally by the Intel486 processor for on- 
chip cache in write through mode. 


The PCD bit controls cacheability on a page-by-page basis. The PCD bit is internally AND’ed 
with the KEN# signal to control cacheability on a cycle-by-cycle basis (see Figure 7-3). PCD=0 
enables caching while PCD=1 forbids it. Note that cache fills are enabled when PCD=0 AND 
KEN#=0. This logical AND is implemented physically with a NOR gate. 


The state of the PCD bit in the page table entry is driven on the PCD pin when a page in external 
memory is accessed. The state of the PCD pin informs the external system of the cacheability of 
the requested information. The external system then returns KEN#, telling the Intel486 processor 
whether the area is cacheable. The Intel486 processor initiates a cache line fill when PCD and 
KEN?# indicate that the requested information is cacheable. 


The PCD bit is OR’ed with the CD (cache disable) bit in control register 0 to determine the state 
of the PCD pin. When CD=1, the Intel486 processor forces the PCD pin HIGH. When CD=0, the 
PCD pin is driven with the value for the page table entry/directory (see Figure 7-3). 


The PWT and PCD bits for a bus cycle are obtained from CR3, the page directory or page table 
entry. These bits are assumed to be zero during Real Mode, whenever paging is disabled, or for 
cycles that bypass paging (I/O references, interrupt acknowledge cycles, and HALT cycles). 
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When paging is enabled, the bits from the page table entry are cached in the TLB, and are driven 
when the page mapped by the TLB entry is referenced. For normal memory cycles, PWT and 
PCD are taken from the page table entry. During TLB refresh cycles in which the page table and 
directory entries are read, the PWT and PCD bits must be obtained elsewhere. During page table 
updates the bits are obtained from the page directory. When the page directory is updated, these 
bits are obtained from CR3. PCD and PWT bits are initialized to zero at reset, but can be modified 
by level 0 software. 


FLUSH# 
Cache Control Logic 
KEN# 
Linear 
Address EeP > 
PWT 
> 


Di 
Control Registers irectory 


(From CRO) 


Figure 7-3. Page Cacheability 
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7.6.1. Write-Back Enhanced IntelDX4™ Processor and Processor Page 
Cacheability 


In Write-Back Enhanced IntelDX4 processor-based systems, both the processor and the system 
hardware must determine the cacheability and the configuration (write-back or write-through) on 
a line-by-line basis. The system hardware’s cacheability is determined by KEN# and the config- 
uration by WB/WT#. The processor’s indication of cacheability is determined by PCD and the 
configuration by PWT. The PWT bit controls the write policy for the second-level caches used 
with the Write-Back Enhanced IntelDX4 processor. Setting PWT to | defines a write-through 
policy for the current page, while clearing PWT to 0 defines a write-back policy for the current 


page. 


7.7 CACHE FLUSHING 


The on-chip cache can be flushed by external hardware or by software instructions. Flushing the 
cache clears all valid bits for all lines in the cache. The cache is flushed when external hardware 
asserts the FLUSH# pin. 


The FLUSH# pin must to be asserted for one clock when driven synchronously or for two clocks 
when driven asynchronously. FLUSH# is asynchronous, but setup and hold times must be met 
for recognition in a particular cycle. FLUSH# should be deasserted before the cache flush is com- 
plete. Failure to deassert the pin causes execution to stop as the processor repeatedly flushes the 
cache. When external hardware activates FLUSH# in response to an I/O write, FLUSH# must be 
asserted for at least two clocks prior to ready being returned for the I/O write. This ensures that 
the flush completes before the processor begins execution of the instruction following the OUT 
instruction. 


The instructions INVD and WBINVD cause the on-chip cache to be flushed. External caches 
connected to the Intel486 processor are signaled to flush their contents when these instructions 
are executed. 


WBINVD also cause an external write-back cache to write back dirty lines before flushing its 
contents. The external cache is signaled using the bus cycle definition pins and the byte enables. 
Refer to Section 9.2.6, “Bus Cycle Definition,” for the bus cycle definition pins and Section 
10.3.11, “Special Bus Cycles,” for special bus cycles. Refer to the Intel486™ Processor Pro- 
grammers Reference Manual for detailed instruction definitions. 


The results of the INVD and WBINVD instructions are identical for the operation of the non- 
write-back enhanced IntelDX4 processor on-chip cache because the cache is write-through. 
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7.7.1. Write-Back Enhanced IntelDX4™ Processor Cache Flushing 
The on-chip cache can be flushed by external hardware or by software instructions. 


Flushing the cache through hardware is accomplished by asserting the FLUSH# pin. This causes 
the cache to write back all modified lines in the cache and mark the state bits invalid. The first 
flush acknowledge cycle is driven by the Write-Back Enhanced IntelDX4 processor, followed by 
the second flush acknowledge cycle after all write-backs and invalidations are complete. The two 
special cycles are issued even when there are no dirty lines to write back. 


The INVD and WBINVD instructions cause the on-chip cache to be invalidated. WBINVD caus- 
es the modified lines in the internal cache to be written back, and all lines to be marked invalid. 
After execution of the WBINVD instruction, the write-back and flush special cycles are driven 
to indicate to external cache that it should write back and invalidate its contents. These two spe- 
cial cycles are issued even when there are no dirty lines to be written back. INVD causes all lines 
in the cache to be invalidated, so modified lines in the cache are not written back. The Flush spe- 
cial cycle is driven after the INVD instruction is executed to indicate to any external cache that it 
should invalidate its contents. Care should be taken when using the INVD instruction to avoid 
creating cache consistency problems. 
NOTE 


It is recommended to use the WBINVD instruction instead of the INVD 
instruction when the on-chip cache is configured in write-back mode. 


The assertion of RESET invalidates the entire cache without writing back the modified lines. No 
special cycles are issued after the invalidation is complete. 


Snoop cycles with invalidation (INV=1) cause the Write-Back Enhanced IntelDX4 processor to 
invalidate an individual cache line. When the snooped line is a modified line, then the processor 
schedules a write-back cycle. Inquire cycles with no-invalidation cause the Write-Back Enhanced 
IntelDX4 processor only to write-back the line, when the inquired line is in M-state, and not in- 
validate the line. 


SRESET, STPCLK#, INTR, NMI and SMI# are recognized and latched, but not serviced during 
the full-cache, modified-line write-backs, caused either by the WBINVD instruction or by 
FLUSH#. However, BOFF#, AHOLD and HOLD are recognized during the full-cache, modi- 
fied-line write-backs. 
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7.8  WRITE-BACK ENHANCED IntelDX4™ PROCESSOR WRITE-BACK CACHE 
ARCHITECTURE 


This section describes additional features pertaining to the write-back mode of the Write-Back 
Enhanced IntelDX4 processor. 


7.8.1. Write-Back Cache Coherency Protocol 


The Write-Back Enhanced IntelDX4 processor cache protocol supports a cache line in one of the 
following four states: 


¢ The line is valid and defined as write-back during allocation (E-state) 

¢ The line is valid and defined as write-through during allocation (S-state) 
¢ The line has been modified (M-state) 

¢ The line it is invalid (I-state) 


These four states are the M (Modified line), E (write-back line), S (write-through line) and I (In- 
valid line) states, and the protocol is referred to as the “Modified MEST protocol.” A definition 
of the states is given below: 


M - Modified: An M-state line is modified (different from main memory) and can be accessed 
(read/written to) without sending a cycle out on the bus. 


E - Exclusive: An E-state line is a ‘write-back’ line, but the line is not modified (i.e., it is 
consistent with main memory). An E-state line can be accessed (read/written to) 
without generating a bus cycle and a write to an E-state line causes the line to 
become modified. 


S- Shared: An S-state line is a ‘write-through’ line, and is consistent with main memory. A 
read hit to an S-state line does not generate bus activity, but a write hit to an S- 
state line generates a write-through cycle on the bus. A write to an S-state line 
updates the cache and the main memory. 


I - Invalid: This state indicates that the line is not in the cache. A read to this line is a miss 
and may cause the Write-Back Enhanced IntelDX4 processor to execute a line 
fill G.e., fetch the whole line into the cache from main memory). A write to an 
invalid line causes the Write-Back Enhanced IntelDX4 processor to execute a 
write-through cycle on the bus. 
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Every line in the Write-Back Enhanced IntelDX4 processor cache is assigned a state that depends 
on both Write-Back Enhanced IntelDX4 processor-generated activities and activities generated 
by the system hardware. As the Write-Back Enhanced IntelDX4 processor is targeted for unipro- 
cessor systems, a subset of MESI protocol, namely METI, is used to maintain cache coherency. 


With the modified MESI protocol it is assumed that in a uniprocessor system, lines are defined 
as write-back or write-through at allocation time. This property associated with a line is never al- 
tered. The lines allocated as write-through go to S-state and remain in S-state. A cache line that 
is allocated as write-back never enters the S-state. The WB/WT# pin is sampled during line allo- 
cation and is used strictly to characterize a line as write-back or write-through. 


State Transition Tables 


State transitions are caused by processor-generated transactions (memory reads/writes) and by a 
set of external input signals and internally-generated variables. The Write-Back Enhanced 
IntelDX4 processor also drives certain pins as a consequence of the cache consistency protocol. 


Read Cycles 
Table 7-5 shows the state transitions for lines in the cache during unlocked read cycles. 
Write Cycles 


The state transitions of cache lines during Write-Back Enhanced IntelDX4 processor-generated 
write cycles are described in Table 7-6. 
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Table 7-5. Cache State Transitions for Write-Back Enhanced IntelDX4™ Processor-Initiated 
Unlocked Read Cycles 


Present State Pin Activity Next State Description 
M n/a M Read hit; data is provided to processor core by cache. 
No bus cycle is generated. 
E n/a E Read hit; data is provided to processor core by cache. 
No bus cycle is generated. 
Ss n/a S$ Read hit; Data is provided to the processor by the 
cache. No bus cycle is generated. 
| CACHE# low E Data item does not exist in cache (MISS). 
AND 
KEN# low 
AND 
WB/WT# high 
AND 
PWT low 
| CACHE# low S) Same as previous read miss case except that WB/WT# 
AND is sampled low with first BRDY# or PWT is high. 
KEN# low 
AND 
(WB/WT# low 
OR PWT high) 
| CACHE# high KEN# pin inactive; the line is not intended to be cached 
OR in the Write-Back Enhanced IntelDX4 processor. 
KEN# high 
NOTES: 


1. Locked accesses to the cache cause the accessed line to transition to the Invalid state. 

2. PCD can also be used by the processor to determine the cacheability, but using the CACHE# pin is 
recommended. The transition from | to E or S states (based on WB/WT#) occurs only when KEN# is 
sampled low one clock prior to the first BRDY# and then one clock prior to the last BRDY#, and the cycle 
is transformed into a line fill cycle. When KEN# is sampled high, the line is not cached and remains in the 


| state. 
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Table 7-6. Cache State Transitions for Write-Back Enhanced IntelDX4™ 
Processor-Initiated Write Cycles 


Present Pin Next ae 
State Activity State Description 

M n/a M Write hit; update cache. No bus cycle generated to update memory. 

E n/a M Write hit; update cache only. No bus cycle generated; line is now 
modified. 

iS) n/a iS) Write hit; cache updated with write data item. A write-through cycle is 
generated on the bus to update memory. Subsequent writes to E-state 
or M-state lines are held up until this write through cycle is completed. 

| n/a Write miss; a write-through cycle is generated on the bus to update 
external memory. No allocation is done. Subsequent writes to the E or 
M lines are blocked until the write miss is completed. 


Note that even though memory writes are buffered while I/O writes are not, these writes appear 
at the pins in the same order as they were generated by the processor. Write-back cycles caused 
by the replacement of M-state lines are buffered, while write backs due to snoop hit to M-state 
lines are not buffered. 


Cache Consistency Cycles (Snoop Cycles) 


The purpose of snoop cycles is to check whether the address being presented by another bus mas- 
ter is contained within the cache of the Write-Back Enhanced IntelDX4 processor. Snoop cycles 
may be initiated with or without an invalidation request (INV = | or 0). When a snoop cycle is 
initiated with INV = 0 (usually during memory read cycles by another master), it is referred to as 
an inquire cycle. When a snoop cycle is initiated with INV = | (usually during memory write cy- 
cles), it is referred to as an invalidate cycle. When the address hits a modified line in the cache, 
HITM# is asserted and the modified line is written back to the bus. Table 7-7 describes state tran- 
sitions for snoop cycles. 


Table 7-7. Cache State Transitions During Snoop Cycles 


Present | Next State} Next State ar 
State | INV=1 INV=0 Beseription 

M I E Snoop hit to a modified line indicated by HITM# low. The state of the 
line changes to E provided INV = 0 and changes to | when INV = 1. 

E I E Snoop hit, no bus cycle generated. State remains unaltered when INV 
= 0, and changes to | when INV = 1. There is no external indication of 
this snoop hit. 

Ss Ss Snoop hit, no bus cycle generated. State remains unaltered when INV 
= 0, and changes to | when INV = 1. There is no external indication of 
this snoop hit. 

Address not in cache. 
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7.8.2 Detecting On-Chip Write-Back Cache of the Write-Back Enhanced 
IntelDX4™ Processor 


The Write-Back Enhanced IntelDX4 processor write-back policy for the on-chip cache can be de- 
tected by software or hardware. The software mechanism uses the CPUID instruction. (See Ap- 
pendix B, “Feature Determination,” for use of the CPUID instruction.) The hardware mechanism 
uses a write-back related output signal from the processor. 


A software mechanism to determine whether a processor has write-back support for the on-chip 
cache should drive the WB/WT# pin to ‘1’ during RESET. This pin is sampled by the processor 
during the falling edge of RESET. Execute the CPUID instruction, which returns the model num- 
ber in the EAX register, EAX[7:4]. When the model number returned is 7 (identifying the pres- 
ence of a Write-Back Enhanced IntelDX4 processor) and the family number is 4, the on-chip 
cache supports the write-back policy. When the model number returned is in the range 0 through 
6 or 8, the on-chip cache supports the write-through policy only. 


The following pseudo code/steps give an example of the initialization BIOS that can detect the 
presence of the write-back on-chip cache: 


¢ Boot address cold start 

¢ Load segment registers and null IDTR 

¢ Execute CPUID instruction and determine the family ID and model ID. 

¢ Compare the family ID to 4 and the Model ID to the values listed in Table D-2 (pg. D-3). 


The hardware mechanism for detecting the presence of write-back cache uses the HITM# signal. 
For the Write-Back Enhanced IntelDX4 processor, this signal is driven inactive (high) during 
RESET. The chipset can sample this output on the falling edge of RESET. When HITM# is sam- 
pled high on the falling edge of RESET, the processor supports on-chip write-back cache config- 
uration. For those processors that do not support internal write-back caching, this signal is an 
INC, and this output is not driven. 
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CHAPTER 8 
SYSTEM MANAGEMENT MODE (SMM) 
ARCHITECTURES 


8.1 SMM OVERVIEW 


The Intel486™ processor supports four modes: Real, Virtual-86, Protected, and System Manage- 
ment Mode (SMM). As an operating mode, SMM has a distinct processor environment, interface 
and hardware/software features. 


SMM provides system designers with a means of adding new software-controlled features to 
computer products that operate transparently to the operating system and software applications. 
SMM is intended for use only by system firmware, not by applications software or general pur- 
pose systems software. 


The SMM architectural extension consists of the following elements: 
1. System Management Interrupt (SMI#) hardware interface. 


2. Dedicated and secure memory space (SMRAM) for SMI# handler code and processor 
state (context) data with a status signal (SMIACT#) for to decoding access to that memory 
space. (The SMBASE address is relocatable and can also be relocated to non-cacheable 
address space.) 


Resume (RSM) instruction, for exiting the System Management Mode. 


4. Special Features such as I/O-Restart, for transparent power management of I/O 
peripherals, and Auto HALT Restart. 


8.2 TERMINOLOGY 


The following terms are used throughout the discussion of System Management Mode. 


SMM System Management Mode. This is the operating environment that 
the processor (system) enters when the System Management 
Interrupt is being serviced. 


SMI# System Management Interrupt. This is part of the SMM interface. 
When SMIf# is asserted (low) it causes the processor to invoke SMM. 
The SMI# pin is the only means of entering SMM. 


SMM Handler System Management Mode handler. This is the code that is executed 
when the processor is in SMM. An example application that this 
code might implement is a power management control or a system 
control function. 
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RSM 


SMRAM 


SMBASE 


Context 


Context Switch 


Resume instruction. This instruction is used by the SMM handler to 
exit SMM and return to the operating system or application process 
that was interrupted. 


Physical memory dedicated to SMM. The SMM handler code and 
related data reside in this memory. This memory is also used by the 
processor to store its context before executing the SMM handler. The 
operating system and applications do not have access to this memory 
space. 


Control register that contains the address of the SMRAM space. 


The processor state just before the processor invokes SMM. The 
context normally consists of the processor registers that fully 
represent the processor state. 


The process of either saving or restoring the context. The SMM 
discussion refers to the context switch as the process of 
saving/restoring the context while invoking/exiting SMM, 
respectively. 


8.3. SYSTEM MANAGEMENT INTERRUPT PROCESSING 


The system interrupts the normal program execution and invokes SMM by generating a System 
Management Interrupt (SMI#) to the processor. The processor services the SMI# by executing 
the following sequence (see Figure 8-1): 


ds 


8-2 


The processor asserts SMIACT#, indicating to the system that it should enable the 


SMRAM. 


The processor saves its state (context) to SMRAM, starting at default address location 
3FFFFH, proceeding downward in a stack-like fashion. 


The processor switches to the System Management Mode processor environment (a 


pseudo-real mode). 
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SMI# 
#1 #20 #8 | #4 #5 
I 
State Slave SMM Handler State Resume 
I 
I 
I 
I 
I 
SMi# 
SMIACT# 
RSM 
f f f 
Flush cache Cache must Cache must 
be empty be empty 
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Figure 8-1. Basic SMI# Interrupt Service 


4. The processor then jumps to the default absolute address of 38000H in SMRAM to 
execute the SMI# handler. This SMI# handler performs the system management activities. 


5. The SMI# handler then executes the RSM instruction (which restores the processors 
context from SMRAM), de-asserts the SMIACT# signal, and then returns control to the 
previously interrupted program execution. 


NOTE 
The above sequence is valid for the default SMBASE value only. See the 
following sections for a description of the SMBASE register and SMBASE 
relocation. 


The System Management Interrupt hardware interface consists of the SMI# interrupt request in- 
put and the SMIACT# output the system uses to decode the SMRAM. 


ow | \ SMI Interface 


Figure 8-2. Basic SMI# Hardware Interface 
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8.3.1. System Management Interrupt (SMI#) 


SMI# is a falling-edge triggered, non-maskable interrupt request signal. SMI# is an asynchronous 
signal, but setup and hold times ty9 and tp; must be met in order to guarantee recognition on a 
specific clock. The SMI# input need not remain active until the interrupt is actually serviced. The 
SMI# input must remain active for a single clock if the required setup and hold times are met. 
SMI# also works correctly if it is held active for an arbitrary number of clocks. 


The SMI# input must be held inactive for at least four external clocks after it is asserted to reset 
the edge triggered logic. A subsequent SMI# might not be recognized if the SMI# input is not 
held inactive for at least four clocks after being asserted. 


SMI#, like NMI, is not affected by the IF bit in the EFLAGS register and is recognized on an 
instruction boundary. An SMI# does not break locked bus cycles. The SMI# has a higher priority 
than NMI and is not masked during an NMI. In order for SMI# to be recognized with respect to 
SRESET, SMI# should not be asserted until two (2) clocks after SRESET becomes inactive. 


After the SMI# interrupt is recognized, the SMI# signal is masked internally until the RSM in- 
struction is executed and the interrupt service routine is complete. Masking the SMI# prevents 
recursive SMI# calls. SMI# must be deasserted for at least four clocks to reset the edge triggered 
logic. If another SMI# occurs while the SMI# is masked, the pending SMI# is recognized and 
executed on the next instruction boundary after the current SMI# completes. This instruction 
boundary occurs before execution of the next instruction in the interrupted application code, re- 
sulting in back-to-back SMM handlers. Only one SMI# can be pending while SMI# is masked. 


The SMI# signal is synchronized internally and must be asserted at least three CLK periods prior 
to asserting the RD Y# signal in order to guarantee recognition on a specific instruction boundary. 
This is important for servicing an I/O trap with an SMI# handler (see Figure 8-3). 


8.3.2 SMI# Active (SMIACT#) 


SMIACT# indicates that the processor is operating in System Management Mode. The processor 
asserts SMIACT# in response to an SMI# interrupt request on the SMI# pin. SMIACT# is driven 
active after the processor has completed all pending write cycles (including emptying the write 
buffers), and before the first access to SMRAM, when the processor saves (writes) its state (or 
context) to SMRAM. SMIACT# remains active until the last access to SMRAM when the pro- 
cessor restores (reads) its state from SMRAM. SMIACT# does not float in response to HOLD. 
SMIACT# is used by the system logic to decode SMRAM (See Figure 8-2). 


The number of CLKs required to complete the SMM state save and restore is dependent on-sys- 
tem memory performance. The values listed in Table 8-1 assume zero wait-state memory writes 
(two CLK cycles), 2-1-1-1 burst read cycles, and zero wait-state non-burst reads (2 CLK cycles). 
Additionally, it is assumed that the data read during the SMM state restore sequence is not cache- 
able. 
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Note: Setup time (A) for recognition on I/O instruction boundary. 
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Figure 8-3. SMI# Timing for Servicing an I/O Trap 


Figure 8-4 and Table 8-1 can be used for latency calculations. As shown, the minimum time re- 
quired to enter an SMI# handler routine for the IntelDX4 processor (3x clock) from the comple- 
tion of the interrupted instruction is given by: 
Latency to beginning of SMI# handler = 
A+B+C=153 CLKs 


and the minimum time required to return to the interrupted application (following the final SMM 
instruction before RSM) is given by: 


Latency to continue interrupted application = 
E+F+G=243 CLKs 
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A5233-01 
Figure 8-4. Intel486™ Processor SMIACT# Timing 
Table 8-1. Intel486™ Processor SMIACT# Timing 
Intel486™ SX IntelDX2™ IntelDX4™ IntelDX4 
Processor Processor | Processor 3X | Processor 2X 
A: Last RDY# from 2 CLK minimum 1 CLK 1 CLK minimum 1 CLK 
non-SMM transfer to minimum minimum 
SMIACT# assertion 
B: SMIACT# assertion | 40 CLK minimum 20 CLK 13 CLK 20 CLK 
to first ADS# for SMM minimum minimum minimum 
state save 
C:SMM state save Approx. Approx. Approx. Approx. 
(dependent on memory 139 CLKs 139 CLKs 139 CLKs 139 CLKs 
performance) 
D:SMM handler User determined User User User 
determined determined determined 

E: SMM state restore Approx. Approx. Approx. Approx. 
(dependent on memory 236 CLKs 236 CLKs 236 CLKs 236 CLKs 
performance) 
F: Last RDY# from 4 CLK minimum 2 CLK 1 CLK minimum 1 CLK 
SMM transfer to de- minimum minimum 
assertion of SMIACT# 
G: SMIACT# de- 20 CLK minimum 10 CLK 6 CLK minimum 10 CLK 
assertion to first non- minimum minimum 
SMM ADS# 
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8.3.3 SMRAM 


The Intel486 processor uses the SMRAM space for state save and state restore operations during 
an SMI# and RSM. The SMI# handler, which also resides in SMRAM, uses the SMRAM space 
to store code, data and stacks. In addition, the SMI# handler can use the SMRAM for system man- 
agement information such as the system configuration, configuration of a powered-down device, 
and system design-specific information. 


The processor asserts the SMIACT# output to indicate to the memory controller that it is operat- 
ing in System Management Mode. The system logic should ensure that only the processor has 
access to this area. Alternate bus masters or DMA devices that try to access the SMRAM space 
when SMIACT# is active should be directed to system RAM in the respective area. 


The system logic is minimally required to decode the physical memory address range from 
38000H-3FFFFH as SMRAM area. The processor saves its state to the state save area from 
3FFFFH downward to 3FEOOH. After saving its state the processor jumps to the address location 
38000H to begin executing the SMI# handler. The system logic can choose to decode a larger 
area of SMRAM as needed. The size of this SMRAM can be between 32 Kbytes and 4 Gbytes. 


The system logic should provide a manual method for switching the SMRAM into system mem- 
ory space when the processor is not in SMM. This enables initialization of the SMRAM space 
(i.e., loading SMI# handler) before executing the SMI# handler during SMM (see Figure 8-5). 


8.3.3.1 SMRAM State Save Map 


When the SMI# is recognized on an instruction boundary, the processor core first sets SMIACT# 
low, indicating to the system logic that accesses are now being made to the system-defined SM- 
RAM areas. The processor then writes its state to the state save area in the SMRAM. The state 
save area starts at CS Base + [8000H + 7FFFH]. The default CS Base is 30000H; therefore the 
default state save area is at 3FFFFH. In this case, the CS Base can also be referred to as the SM- 
BASE. 


If SMBASE relocation is enabled, then the SMRAM addresses can change. The following for- 
mula is used to determine the relocated addresses where the context is saved. The context resides 
at CS Base + [8000H + Register Offset], where the default initial CS Base is 30000H and the Reg- 
ister Offset is listed in the SMRAM state save map (Table 8-2). Reserved spaces are used to ac- 
commodate new registers in future processors. The state save area starts at 7FFFH and continues 
downward in a stack-like fashion. 


Some of the registers in the SMRAM state save area may be read and changed by the SMI# han- 
dler, with the changed values restored to the processor registers by the RSM instruction. Some 
register images are read-only, and must not be modified (modifying these registers results in un- 
predictable behavior). The values stored in reserved areas may change in future processors. An 
SMM handler should not rely on any values stored in a reserved area. 
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space used for redirected to 
loading SMRAM SMRAM 


Figure 8-5. Redirecting System Memory Addresses to SMRAM 


Table 8-2. SMRAM State Save Map (Sheet 1 of 2) 


Register Offset Register Writeable?? 
7FFC CRO NO 
7FF8 CR3 NO 
7FF4 EFLAGS YES 
7FFO EIP YES 
7FEC EDI YES 
7FE8 ESI YES 
7FE4 EBP YES 
7FEO ESP YES 
7FDC EBX YES 
7FD8 EDX YES 
7FD4 ECX YES 
7FDO EAX YES 

NOTES: 


1. Upper two bytes are reserved. 

2. Modifying a value that is marked as not writeable results in 
unpredictable behavior. 

3. Words are stored in two consecutive bytes in memory with the 
low-order byte at the lowest address and the high-order byte at 
the high address. 
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Table 8-2. SMRAM State Save Map (Sheet 2 of 2) 


Register Offset Register Writeable?? 
7FCC DR6 NO 
7FC8 DR7 NO 
7FC4 TR! NO 
7FCO LDTR' NO 


7FB4 
7FBO 
7FAG 
7FA8 
7FA7—7F98 Reserved 
7F94 IDT Base 
7F93—-7F8C Reserved 
7F88 GDT Base 
7F87-7F04 Reserved 
7F02 Auto HALT Restart Slot (Word)? YES 
7F00 /O Trap Restart Slot (Word)? YES 
7EFC SMM Revision Identifier NO 
(Dword)3 
7EF8 SMBASE Slot (Dword)? YES 
7EF7—7E00 Reserved NO 
NOTES: 


1. Upper two bytes are reserved. 

2. Modifying a value that is marked as not writeable results in 
unpredictable behavior. 

3. Words are stored in two consecutive bytes in memory with the 
low-order byte at the lowest address and the high-order byte at 
the high address. 


The following registers are saved and restored (in reserved areas of the state save), but are not 
visible to the system software programmer: CR1, CR2, and CR4, hidden descriptor registers for 
CS, DS, ES, FS, GS, and SS. 


If an SMI# request is issued for the purpose of powering down the processor, the values of all 
reserved locations in the SMM state save must be saved to non-volatile memory. 


The following registers are not automatically saved and restored by SMI# and RSM: DRS:0, 
TR7:3, and the FPU registers STn, FCS, FSW, tag word, FP instruction pointer, FP opcode, and 
operand pointer. 
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For all SMI# requests except for suspend/resume, these registers do not have to be saved because 
their contents do not change. However, during a power down suspend/resume, a resume reset 
clears these registers to their default values. In this case, the suspend SMI# handler should read 
these registers directly to save them and restore them during the power up resume. Anytime the 
SMI# handler changes these registers in the processor, it must also save and restore them. 


8.3.4 Exit From SMM 


The RSM instruction is only available to the SMI# handler. The opcode of the instruction is 
OFAAH. Execution of this instruction while the processor is executing outside of SMM causes 
an invalid opcode error. The last instruction of the SMI# handler is the RSM instruction. 


The RSM instruction restores the state save image from SMRAM back to the processor, then re- 
turns control back to the interrupted program execution. There are three SMM features that can 
be enabled by writing to control “slots” in the SMRAM state save area. 


Auto HALT Restart. It is possible for the SMI# request to interrupt the HALT state. The SMI# 
handler can tell the RSM instruction to return control to the HALT instruction or to return control 
to the instruction following the HALT instruction by appropriately setting the Auto HALT Re- 
start slot. The default operation is to restart the HALT instruction. 


I/O Trap Restart. If the SMI# interrupt was generated on an I/O access to a powered-down de- 
vice, the SMI# handler can tell the RSM instruction to re-execute that I/O instruction by setting 
the I/O Trap Restart slot. 


SMBASE Relocation. The system can relocate the SMRAM by setting the SMBASE Relocation 
slot in the state save area. The RSM instruction sets the SMBASE in the processor based on the 
value in the SMBASE Relocation slot. The SMBASE must be 32-Kbyte aligned. 


For further details on these SMM features, see Section 8.5 “SMM Features.” 


If the processor detects invalid state information, it enters the shutdown state. This happens only 
in the following situations: 


¢ The value stored in the SMBASE slot is not a 32-Kbyte aligned address. 
¢ A reserved bit of CR4 is set to 1. 
¢ A combination of bits in CRO is illegal; namely, (PG=1 and PE=0) or (NW=1 and CD=0). 


In shutdown mode, the processor stops executing instructions until an NMI interrupt is received 
or reset initialization is invoked. The processor generates a special bus cycle to indicate it has en- 
tered shutdown mode. 


NOTE 
INTR and SMI# also brings the processor out of a shutdown that is 
encountered due to invalid state information from SMM execution. Make sure 
that INTR and SMI# are not asserted if SMM routines are written such that a 
shutdown occurs. 
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8.4 SYSTEM MANAGEMENT MODE PROGRAMMING MODEL 


8.4.1. Entering System Management Mode 


SMM is one of the major operating modes, on a level with Protected Mode, Real Mode or Virtual- 
86 Mode. Figure 8-6 shows how the processor can enter SMM from any of the three modes and 
then return. 


Real Mode 


Reset or RSM SMl# 
PE=1 


SMli# System 
Management 
Mode 


Reset or 
PE=0 


Reset 


Protected Mode 


Virtual - 86 Mode 


Note: Reset could occur by asserting the RESET or SRESET pin. 
A5234-01 


Figure 8-6. Transition to and from System Management Mode 


The external signal SMI# causes the processor to switch to SMM. The RSM instruction exits 
SMM. SMM is transparent to applications programs and operating systems because of the fol- 
lowing: 
¢ The only way to enter SMM is via a type of non-maskable interrupt triggered by an external 
signal. 


¢ The processor begins executing SMM code from a separate address space, called system 
management RAM (SMRAM). 


¢ Upon entry into SMM, the processor saves the register state of the interrupted program in a 
part of SMRAM called the SMM context save space. 


¢ All interrupts normally handled by the operating system or by applications are disabled 
upon entry into SMM. 


¢ A special instruction, RSM, restores processor registers from the SMM context save space 
and returns control to the interrupted program. 


SMM is similar to Real Mode in that there are no privilege levels or address mapping. An SMM 
program can execute all I/O and other system instructions and can address up to 4 Gbytes of 
memory. 
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8.4.2. Processor Environment 


When an SMI# signal is recognized on an instruction execution boundary, the processor waits for 
all stores to complete, including emptying of the write buffers. The final write cycle is complete 
when the system returns RDY# or BRDY#. The processor then drives SMIACT# active, saves 
its register state to SMRAM space, and begins to execute the SMM handler. 


SMI# has greater priority than debug exceptions and external interrupts. This means that if more 
than one of these conditions occur at an instruction boundary, only the SMI# processing occurs, 
not a debug exception or external interrupt. Subsequent SMI# requests are not acknowledged 
while the processor is in SMM. The first SMI# interrupt request that occurs while the processor 
is in SMM is latched and serviced when the processor exits SMM with the RSM instruction. The 
processor latches only one SMI# while it is in SMM. 


When the processor invokes SMM, the processor core registers are initialized as shown in 
Table 8-3. 


Table 8-3. SMM Initial Processor Core Register Settings 


Register Contents 
General Purpose Unpredictable 

Registers 

EFLAGS 00000002H 


EIP 00008000H 
CS Selector 3000H 


CS Base SMM Base 
(default 30000H) 


DS, ES, FS, GS, 0000H 
SS Selectors 


DS, ES, FS, GS, 000000000H 
SS Bases 


DS, ES, FS, GS, OFFFFFFFFH 
SS Limits 


CRO Bits 0,2,3 & 31 cleared (PE, 
EM, TS & PG); others are 
unmodified 


Unpredictable 


00000000H 
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The following is a summary of the key features in the SMM environment: 
1. Real Mode style address calculation. 

4-Gbyte limit checking. 

IF flag is cleared. 

NMI is disabled. 

TF flag in EFLAGS is cleared; single step traps are disabled. 

DR7 is cleared, except for bits 12 and 13; debug traps are disabled. 


a2 A ea 


The RSM instruction no longer generates an invalid opcode error. 
8. Default 16-bit opcode, register and stack use. 


All bus arbitration (HOLD, AHOLD, BOFF#) inputs and bus sizing (BS8#, BS 16#) inputs oper- 
ate normally while the processor is in SMM. 


8.4.2.1 Write-Back Enhanced IntelDX4™ Processor Environment 


When the Write-Back Enhanced Intel486 processor is in Enhanced Bus Mode, SMI# has greater 
priority than debug exceptions and external interrupts, except for FLUSH# and SRESET (see 
Section 3.15.6 “Interrupt and Exception Priorities”). 


8.4.3. Executing System Management Mode Handler 


The processor begins execution of the SMM handler at offset 8000H in the CS segment. The CS 
Base is initially 30000H. However, the CS Base can be changed by using the SMM Base reloca- 
tion feature. 


When the SMM handler is invoked, the processors PE and PG bits in CRO are reset to 0. The pro- 
cessor is in an environment similar to Real mode, but without the 64-Kbyte limit checking. How- 
ever, the default operand size and the default address size are set to 16 bits. 


The EM bit is cleared so that no exceptions are generated. (If the SMM was entered from Protect- 
ed Mode, the Real Mode interrupt and exception support is not available.) The SMI# handler 
should not use floating-point unit instructions until the FPU is properly detected (within the SMI# 
handler) and the exception support is initialized. 


Because the segment bases (other than CS) are cleared to 0 and the segment limits are set to 4 
Gbytes, the address space may be treated as a single flat 4-Gbyte linear space that is unsegmented. 
The processor is still in Real Mode and when a segment selector is loaded with a 16-bit value, 
that value is then shifted left by 4 bits and loaded into the segment base cache. The limits and 
attributes are not modified. 


In SMM, the processor can access or jump anywhere within the 4-Gbyte logical address space. 
The processor can also indirectly access or perform a near jump anywhere within the 4-Gbyte 
logical address space. 
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8.4.3.1 Exceptions and Interrupts within System Management Mode 


When the processor enters SMM, it disables INTR interrupts, debug and single-step traps by 
clearing the EFLAGS, DR6 and DR7 registers. This prevents a debug application from acciden- 
tally breaking into an SMM handler. This is necessary because the SMM handler operates from 
a distinct address space (SMRAM), and hence, the debug trap does not represent the normal sys- 
tem memory space. 


If an SMM handler wishes to use the debug trap feature of the processor to debug SMM handler 
code, it must first ensure that an SMM-compliant debug handler is available. The SMM handler 
must also ensure DR3:0 is saved to be restored later. The debug registers DR3:0 and DR7 must 
then be initialized with the appropriate values. 


If the processor wishes to use the single step feature of the processor, it must ensure that an SMM 
compliant single step handler is available and then set the trap flag in the EFLAGS register. 


If the system design requires the processor to respond to hardware INTR requests while in SMM, 
it must ensure thatan SMM compliant interrupt handler is available and then set the interrupt flag 
in the EFLAGS register (using the STI instruction). Software interrupts are not blocked upon en- 
try to SMM, and the system software designer must provide an SMM compliant interrupt handler 
before attempting to execute any software interrupt instructions. Note that in SMM mode, the in- 
terrupt vector table has the same properties and location as the Real Mode vector table. 


NMI interrupts are blocked upon entry to the SMM handler. If an NMI request occurs during the 
SMM handler, it is latched and serviced after the processor exits SMM. Only one NMI request is 
latched during the SMM handler. If an NMI request is pending when the processor executes the 
RSM instruction, the NMI is serviced before the next instruction of the interrupted code se- 
quence. 


Although NMI requests are blocked when the processor enters SMM, they may be enabled 
through software by executing an IRET instruction. If the SMM handler requires the use of NMI 
interrupts, it should invoke a dummy interrupt service routine for the purpose of executing an 
IRET instruction. Once an IRET instruction is executed, NMI interrupt requests are serviced in 
the same “Real Mode” manner in which they are handled outside of SMM. 
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8.5 SMM FEATURES 


8.5.1. SMM Revision Identifier 


The SMM revision identifier is used to indicate the version of SMM and the SMM extensions 
supported by the processor. The SMM revision identifier is written during SMM entry and can 
be examined in SMRAM space at register offset 7EFCH. The lower word of the SMM revision 
identifier refers to the version of the base SMM architecture. The upper word of the SMM revi- 
sion identifier refers to the extensions available (see Figure 8-7). 


17 16 
Register Offset 
Intel Reserved 7EFCH 


SMBASE Relocation rae T 


SMM Revision Level 
\/O Trap with Restart 


Figure 8-7. SMM Revision Identifier 


Table 8-4. Bit Values for SMM Revision Identifier 


Bits Value Comments 

16 0 Processor does not support I/O trap restart 

16 1 Processor supports I/O trap restart 

17 0 Processor does not support SMBASE relocation 
17 1 Processor supports SMBASE relocation 


Bit 16 of the SMM revision identifier is used to indicate to the SMM handler that this processor 
supports the SMM I/O trap extension. If this bit is high, then the processor supports the SMM I/O 
trap extension. If this bit is low, then this processor does not support I/O trapping using the I/O 
trap slot mechanism (see Table 8-4). 


Bit 17 of this slot indicates whether the processor supports relocation of the SMM jump vector 
and the SMRAM base address (see Table 8-4). 


The Intel486 processor supports both the I/O trap restart and the SMBASE relocation features. 
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8.5.2 Auto Halt Restart 


The Auto HALT restart slot at register offset (word location) 7F02H in SMRAM indicates to the 
SMM handler that the SMI# interrupted the processor during a HALT state (bit 0 of slot 7FO2H 
is set to | if the previous instruction was a HALT). If the SMI# does not interrupt the processor 
in a HALT state, then the SMI# microcode sets bit 0 of the Auto HALT Restart slot to a value of 
0. If the previous instruction was a HALT, the SMM handler can choose to either set or reset bit 
0. If this bit is set to 1, the RSM microcode execution forces the processor to re-enter the HALT 
state. If this bit is set to 0 when the RSM instruction is executed, the processor continues execu- 
tion starting with the instruction just after the interrupted HALT instruction. Note that if the in- 
terrupted instruction was not a HALT instruction (bit 0 is set to 0 in the Auto HALT restart slot 
upon SMM entry), setting bit 0 to 1 causes unpredictable behavior when the RSM instruction is 
executed (see Figure 8-8 and Table 8-5). 


Register Offset 
Intel Reserved Sey 


Auto HALT 
Restart 


Figure 8-8. Auto HALT Restart 


Table 8-5. Bit Values for Auto HALT Restart 


Value of | Value of 
Bit 0 at Bit 0 at Comments 
Entry Exit 
0 0 Returns to next instruction in interrupted program. 
0 1 Unpredictable. 
1 0 Returns to next instruction after HALT. 
1 1 Returns to HALT state. 


If the HALT instruction is restarted, the processor generates a memory access to fetch the HALT 
instruction (if it is not in the internal cache) and executes a HALT bus cycle. 
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8.5.3 I/O Instruction Restart 


The I/O instruction restart slot (register offset 7FOOH in SMRAM) gives the SMM handler the 
option of causing the RSM instruction to automatically re-execute the interrupted I/O instruction. 
When the RSM instruction is executed, if the I/O instruction restart slot contains the value OFFH, 
then the processor automatically re-executes the I/O instruction that the SMI# trapped. If the I/O 
instruction restart slot contains the value 0OH when the RSM instruction is executed, then the pro- 
cessor does not re-execute the I/O instruction. The processor automatically initializes the I/O in- 
struction restart slot to OOH during SMM entry. The I/O instruction restart slot should be written 
only when the processor has generated an SMI# on an I/O instruction boundary. Processor oper- 
ation is unpredictable when the I/O instruction restart slot is set when the processor is servicing 
an SMI# that originated on a non-I/O instruction boundary (see Figure 8-9 and Table 8-6). 


15 0 


Register Offset 
7FOOH 


1/O Instruction 
Restart Slot 


Figure 8-9. I/O Instruction Restart 


Table 8-6. I/O Instruction Restart Value 


Value at Entry Value at Exit Comments 
00H 00H Do not restart trapped I/O instruction 
00H OFFH Restart trapped I/O instruction 


If the system executes back-to-back SMI# requests, the second SMM handler must not set the I/O 
instruction restart slot (see Section 8.6.6 “Nested SMI#s and I/O Restart’’). 


8.5.4 SMM Base Relocation 


The Intel486 processor provides a control register, SMBASE. The address space used as 
SMRAM can be modified by changing the SMBASE register before exiting an SMI# handler 
routine. SMBASE can be changed to any 32-Kbyte aligned value (values that are not 32-Kbyte 
aligned cause the processor to enter the shutdown state when executing the RSM instruction). 
SMBASE is set to the default value of 30000H on RESET, but is not changed on SRESET. If the 
SMBASE register is changed during an SMM handler, all subsequent SMI# requests initiate a 
state save at the new SMBASE (see Figure 8-10). 
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31 0 


Register Offset 
7EF8H 


—— a SMM Base 


Figure 8-10. SMM Base Location 


The SMBASE slot in the SMM state save area is used to indicate and change the SMI# jump vec- 
tor location and the SMRAM save area. When bit 17 of the SMM revision identifier is set, then 
this feature exists and the SMRAM base and jump vector are as indicated by the SMM base slot. 
During the execution of the RSM instruction, the processor reads this slot and initializes the pro- 
cessor to use the new SMBASE during the next SMI#. During an SMI#, the processor performs 
a context save to the new SMRAM area pointed to by the SMBASE, stores the current SMBASE 
in the SMM Base slot (offset 7EF8H), and then start execution of the new jump vector based on 
the current SMBASE. 


The SMBASE must be a 32-Kbyte aligned, 32-bit integer that indicates a base address for the 
SMRAM context save area and the SMI# jump vector. For example when the processor first pow- 
ers up, the minimum SMRAM area is from 38000H-3FFFFH. The default SMBASE is 30000H. 
Hence the starting address of the jump vector is calculated by: 


SMBASE + 8000H 
While the starting address for the SMRAM state save area is calculated by: 
SMM Base + [8000H + 7FFFH] 


Hence, when this feature is enabled, the SMRAM register map is addressed according to the 
above formulas (see Figure 8-11). 


To change the SMRAM base address and SMM jump vector location, the SMM handler should 
modify the SMBASE slot. Upon executing an RSM instruction, the processor reads the SMBASE 
slot and stores it internally. Upon recognition of the next SMI# request, the processor uses the 
new SMBASE slot for the SMRAM dump and SMI# jump vector. 


If the modified SMBASE slot does not contain a 32-Kbyte aligned value, the RSM microcode 
causes the processor to enter the shutdown state. 
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SMRAM 


SMBASE + 8000H | Siart of State Slave 
+ 7FFFH 


SMBASE + 8000H SMM Handler Entry 


SMBASE 


Figure 8-11. SMRAM Usage 


8.6 SMM SYSTEM DESIGN CONSIDERATIONS 


8.6.1 


SMRAM Interface 


The hardware designed to control the SMRAM space must follow these guidelines: 


1. 


A provision should be made to allow for initialization of SMRAM space during system 
boot up. This initialization of SMRAM space must happen before the first occurrence of 
an SMI# interrupt. Initializing the SMRAM space must include installation of an SMM 
handler, and may include installation of related data structures necessary for particular 
SMM applications. The memory controller providing the interface to the SMRAM should 
provide a means for the initialization code to manually open the SMRAM space. 


A minimum initial SMRAM address space of 383000H-3FFFFH should be decoded by the 
memory controller. 


Alternate bus masters (such as DMA controllers) should not be allowed to access 
SMRAM space. Only the processor, either through SMI# or during initialization, should 
be allowed access to SMRAM. 


In order to implement a zero-volt suspend function, the system must have access to all of 
normal system memory from within an SMM handler routine. If the SMRAM is going to 
overlay normal system memory, there must be a method of accessing any system memory 
located underneath SMRAM. 


There are two potential schemes for locating the SMRAM: either overlaid to an address space on 
top of normal system memory, or placed in a distinct address space (see Figure 8-12). When SM- 
RAM is overlaid on top of normal system memory, the processor output signal SMIACT# must 
be used to distinguish SMRAM from main system memory. Additionally, if the overlaid normal 
memory is cacheable, both the processor internal cache and any second-level caches must be 
empty before the first read of an SMM handler routine. If the SMM memory is cacheable, the 
caches must be empty before the first read of normal memory following an SMM handler routine. 
This is done by flushing the caches, and is required to maintain cache coherency. When the de- 
fault SMRAM location is used, SMRAM is overlaid on top of system main memory (at 38000H 
through 3FFFFH). 
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If SMRAM is located in its own distinct memory space, that can be completely decoded using 
only the processor address signals, it is said to be non-overlaid. In this case, there are no new re- 
quirements for maintaining cache coherency. 


SMRAM Normal 
Memory 
SMRAM 
Normal 
Shadowed Region Y/ 

Memory 
Non-overlaid Overlaid 
(no need to (caches must 
flush caches) be flushed) 


Figure 8-12. SMRAM Location 


8.6.2 Cache Flushes 


The processor does not unconditionally flush its cache before entering SMM (this option is left 
to the system designer). If SMRAM is shadowed in a cacheable memory area that is visible to the 
application or operating system, it is necessary for the system to empty both the processor cache 
and any second-level cache before entering SMM. That is, if SMRAM is in the same physical 
address location as the normal cacheable memory space, then an SMM read may hit the cache, 
which would contain normal memory space code/data. If the SMM memory is cacheable, the nor- 
mal read cycles after SMM may hit the cache, which may contain SMM code/data. In this case 
the cache should be empty before the first memory read cycle during SMM and before the first 
normal cycle after exiting SMM (see Figure 8-13). 
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SMI# 
#1 #20 #8 | #4 #5 
I 
State Slave SMM Handler State Resume 
I 
I 
I 
I 
I 
SMi# 
SMIACT# 
RSM 
f f f 
Flush cache Cache must Cache must 
be empty be empty 
A5237-01 


Figure 8-13. FLUSH# Mechanism during SMM 


The FLUSH# and KEN# signals can be used to ensure cache coherency when switching between 
normal and SMM modes. Cache flushing during SMM entry is accomplished by asserting the 
FLUSH# pin when SMI# is driven active. Cache flushing during SMM exit is accomplished by 
asserting the FLUSH# pin after the SMIACT# pin is deasserted (within one CLK). To guarantee 
this behavior, the constraints on setup and hold timings on the interaction of FLUSH# and SMI- 
ACT# as specified for a processor should be followed. 


If the SMRAM area is overlaid over normal memory and if the system designer does not want to 
flush the caches upon leaving SMM, then references to the SMRAM area should not be cached. 
It is the obligation of the system designer to ensure that the KEN# pin is sampled inactive during 
all references to the SMRAM area. Figures 8-14 and 8-15 illustrate a cached and non-cached 
SMM using FLUSH# and KEN#. 
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State SMM State Normal 
Slave Handler Resume Cycle 
SMI# 
SMIACT# 
FLUSH# 
A5238-01 
Figure 8-14. Cached SMM 
State SMM State Normal 
Slave Handler Resume Cycle 
SMI# 
RSM 
SMIACT# Ee 
KEN# 
FLUSH# 
A5239-01 
Figure 8-15. Non-Cached SMM 
8.6.2.1 Write-Back Enhanced IntelDX4™ Processor System Management Mode and 


Cache Flushing 


Regardless of the on-chip cache mode (i.e., write-through or write-back) it is recommended that 
SMRAM be non-overlaid. This provides the greatest freedom for caching both SMRAM and nor- 
mal memory, provides a simplified memory controller design, and eliminates the performance 
penalty of flushing. 


In general, cache flushing is not required when the SMRAM and normal memory are not over- 
laid. Table 8-7 gives the cache flushing requirements for entering and exiting SMM, when the 
SMRAM is not overlaid with normal memory space. 
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SMRAM can not be cached as write-back lines. If SMRAM is cached, it should be cached only 
as write-through lines. This is because dirty lines can not be written back to SMRAM upon exit 
from SMM. The de-assertion of SMIACT# signals that the processor is exiting SMM, and is used 
to assert FLUSH#. By the time the write back of dirty lines occurs, SMIACT# would already be 
inactive, so the SMRAM could no longer be decoded. When the SMRAM is cached as write- 
through, this problem does not occur. 


Table 8-7. Cache Flushing (Non-Overlaid SMRAM) 


Normal Memory , 
Cacheable SMRAM Cacheable FLUSH Entering SMM 

No No No 

No WT No 

WT No No 

WB No No, but Snoop WBs must go to Normal 
Memory Space. 

WT WT No 

WB WT No, but Snoop and Replacement WBs 
must go to normal memory space. 


Coherency requirements must be met when normal memory is cached in write-back mode. In this 
case, the snoop and replacement write-backs that occur during SMM must go to normal memory, 
even though SMIACTH# is active. This requirement is compatible with SMM security require- 
ments, because these write backs can not decode the SMRAM, and the memory system must be 
able to handle this situation properly. 


If SMRAM is overlaid with normal memory space, additional system design features are needed 
to ensure that cache coherency is maintained. Table 8-8 lists the cache flushing requirements for 
entering and exiting the SMM when the SMRAM is overlaid with normal memory space. 


Table 8-8. Cache Flushing (Overlaid SMRAM) 


Normal Memory FLUSH Entering FLUSH Exiting 
Cacheable SRAM Gacheable SMM SMM 
No No No No 
No WT No Yes 
WT or WB No Yes No 
WT or WB WT Yes Yes 


If SMI# and FLUSH# are asserted together, the Write-Back Enhanced Intel486 processor guar- 
antees that FLUSH# is recognized first, followed by the SMI#. If the cache is configured in the 
write-back mode, the modified lines are written back to the normal user space, followed by the 
two special cycles. The SMI# is then recognized and the transition to SMM occurs, as shown in 
Figure 8-16. 
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Cache flushing during SMM exit is accomplished by asserting the FLUSH# pin after the 
SMIACT# pin is deasserted (within 1 CLK). To guarantee this behavior, follow the constraints 
on setup and hold timings for the interaction of FLUSH# and SMIACT# as specified for the 
Write-Back Enhanced IntelDX4 processor. 


The WBINVD instruction should not be used to flush the cache when exiting SMM. Instead, the 
FLUSH# pin should be asserted after the SMIACT# pin is deasserted (within one CLK). The 
cache coherency requirements associated with SMM and write-through vs. write-back caches 
also apply to second-level cache control designs. The appropriate second-level cache flushing 
also is required upon entering and exiting the SMM. 


Write- 

Back State SMM State Normal 

Flash Cache Cycles, Slave Handler Resume Cycle 
SMIl# 

RSM 
SMIACT# Ea 
FLUSH# 
Cache must Cache must 
be empty be empty 
A5240-01 


Figure 8-16. Write-Back Enhanced IntelDX4™ Processor Cache Flushing for Overlaid 
SMRAM upon Entry and Exit of Cached SMM 


8.6.2.2 Snoop During SMM 


Snoops cycles are allowed during SMM. However, because the SMRAM is always cached as a 
write-through, there can never be a snoop hit to a modified line in the SMRAM address space. 
Consequently, if there is a snoop hit to a modified line, it corresponds to the normal address space. 
In this case, even though SMIACT# is asserted, the memory controller must drive the snoop 
write-back cycle to the normal memory space and not to the SMRAM address space. 


If the overlaid normal memory is cacheable, FLUSH# must be asserted when entering SMM, 
causing all modified lines of normal memory to be written back. As a result, there cannot be a 
snoop hit to a modified line in the cacheable normal memory space that is overlaid with the 
SMRAM space. 


If the overlaid normal memory is not cacheable, no flushing is necessary when entering SMM. If 
normal memory is not overlaid with SMRAM, no flushing is required upon entering SMM and it 
is possible that a snoop can hit a modified line cached from anywhere in normal memory space 
while the processor is in SMM. 
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8.6.3 A20M# Pin and SMBASE Relocation 


Systems based on a PC-compatible architecture contain a feature that enables the processor ad- 
dress bit A20 to be forced to 0. This limits physical memory to a maximum of | Mbyte, and is 
provided to ensure compatibility with those programs that relied on the physical address wrap 
around functionality of the 8088 processor. The A20M# pin on Intel486 processors provides this 
function. When A20M# is active, all external bus cycles drive A20M# low, and all internal cache 
accesses are performed with A20M# low. 


The A20M# pin is recognized while the processor is in SMM. The functionality of the AZOM# 
input must be recognized in the following two instances: 


1. Ifthe SMM handler needs to access system memory space above | Mbyte (for example, 
when saving memory to disk for a zero-volt suspend), the A2OM# pin must be deasserted 
before the memory above | Mbyte is addressed. 


2. IfSMRAM has been relocated to address space above | Mbyte, and A20M# is active 
upon entering SMM, the processor attempts to access SMRAM at the relocated address, 
but with A20 low. This could cause the system to crash, because there would be no valid 
SMM interrupt handler at the accessed location. 


In order to account for the above two situations, the system designer must ensure that A20M# is 
deasserted on entry to SMM. A20M# must be driven inactive before the first cycle of the SMM 
state save, and must be returned to its original level after the last cycle of the SMM state restore. 
This can be done by blocking the assertion of A2ZOM# when SMIACT# is active. 


8.6.4 Processor Reset During SMM 


The system designer should take into account the following restrictions while implementing the 
processor RESET logic: 


1. When running software written for the 80286 processor, an SRESET is used to switch the 
processor from Protected Mode to Real Mode. Note that SRESET has a higher interrupt 
priority than SMIACT#. When the processor is in SMM, the SRESET to the processor 
during SMM should be blocked until the processor exits SMM. SRESET must be blocked 
starting from the time SMI# is driven active and ending at least 20 CLK cycles after 
SMIACTH# is de-asserted. Be careful not to block the global system RESET, which may be 
necessary to recover from a system crash. 


2. During execution of the RSM instruction to exit SMM, there is a small time window 
between the de-assertion of SMIACT# and the completion of the RSM microcode. If 
SRESET is asserted during this window, it is possible that the SMRAM space will be 
violated. The system designer must guarantee that SRESET is blocked until at least 20 
processor clock cycles after SMIACT# has been driven inactive. 


3. Any request for a processor SRESET for the purpose of switching the processor from 
Protected Mode to Real Mode must be acknowledged after the processor has exited SMM. 
In order to maintain software transparency, the system logic must latch any SRESET 
signals that are blocked during SMM. 
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8.6.5 SMM and Second-Level Write Buffers 


Before an Intel486 processor enters SMM, it empties its internal write buffers. This is necessary 
so that the data in the write buffers is written to normal memory space, not SMM space. Once the 
processor is ready to begin writing an SMM state save to SMRAM, it asserts SMIACT#. SMI- 
ACT# may be driven active by the processor before the system memory controller has had an op- 
portunity to empty the second-level write buffers. 


To prevent the data from these second level write buffers from being written to the wrong loca- 
tion, the system memory controller must direct the memory write cycles to either SMM space or 
normal memory space. This can be accomplished by saving the status of SMIACT# along with 
the address for each word in the write buffers. 


8.6.6 Nested SMI#s and I/O Restart 


Special care must be taken when executing an SMM handler for the purpose of restarting an I/O 
instruction. When the processor executes a RSM instruction with the I/O restart slot set, the re- 
stored EJP is modified to point to the instruction immediately preceding the SMI# request, so that 
the I/O instruction can be re-executed. If a new SMI# request is received while the processor is 
executing an SMM handler, the processor services this SMI# request before restarting the origi- 
nal I/O instruction. If the I/O restart slot is set when the processor executes the RSM instruction 
for the second SMM handler, the RSM microcode decrements the restored EIP again. EI, there- 
fore, points to an address different than the originally interrupted instruction, and the processor 
begins execution of the interrupted application code at an incorrect entry point. 


To prevent this problem, the SMM handler routine must not set the I/O restart slot during the sec- 
ond of two consecutive SMM handlers. 


8.7 SMM SOFTWARE CONSIDERATIONS 


8.7.1. SMM Code Considerations 


The default operand size and the default address size are 16 bits; however, operand-size override 
and address-size override prefixes can be used as needed to directly access data anywhere within 
the 4-Gbyte logical address space. 


With operand-size override prefixes, the SMM handler can use jumps, calls, and returns to trans- 
fer control to any location within the 4-Gbyte space. Note, however, the following restrictions: 


¢ Any control transfer that does not have an operand-size override prefix truncates EJP to 16 
low-order bits. 


¢ Due to the Real Mode style of base-address formation, a far jump or call cannot transfer 
control to a segment with a base address of more than 20 bits (one Mbyte). 
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8.7.2 Exception Handling 


Upon entry into SMM, external interrupts that require handlers are disabled (the IF bit in the 
EFLAGS is cleared). This is necessary because, while the processor is in SMM, it is running in 
a separate memory space. Consequently the vectors stored in the interrupt descriptor table (IDT) 
for the prior mode are not applicable. Before allowing exception handling (or software inter- 
rupts), the SMM program must initialize new interrupt and exception vectors. The interrupt vec- 
tor table for SMM has the same format as for Real Mode. Until the interrupt vector table is 
correctly initialized, the SMM handler must not generate an exception (or software interrupt). 
Even though hardware interrupts are disabled, exceptions and software interrupts can occur. Only 
a correctly written SMM handler can prevent internal exceptions. When new exception vectors 
are initialized, internal exceptions can be serviced. The following restrictions apply: 


1. Due to the Real Mode style of base address formation, an interrupt or exception cannot 
transfer control to a segment with a base address of more that 20 bits. 


2. An interrupt or exception cannot transfer control to a segment offset of more than 16 bits 
(64 Kbytes). 


3. If exceptions or interrupts are allowed to occur, only the low order 16 bits of the return 
address (EIP) are pushed onto the stack. If the offset of the interrupted procedure is greater 
than 64 Kbytes, it is not possible for the interrupt/exception handler to return control to 
that procedure. (One work-around could be to perform software adjustment of the return 
address on the stack.) 


4. The SMBASE relocation feature affects the way the processor returns from an interrupt or 
exception during an SMI# handler. 


8.7.3. Halt During SMM 


HALT should not be executed during SMM, unless interrupts have been enabled (see Section 
8.7.2 “Exception Handling”). Interrupts are disabled in SMM. INTR, NMI, and SMI# are the only 
events that take the processor out of HALT. 


8.7.4 Relocating SMRAM to an Address Above One Megabyte 


Within SMM (or Real Mode), the segment base registers can be updated only by changing the 
segment register. The segment registers contain only 16 bits, which allows only 20 bits to be used 
for a segment base address (the segment register is shifted left four bits to determine the segment 
base address). If SMRAM is relocated to an address above one megabyte, the segment registers 
can no longer be initialized to point to SMRAM. 


These areas can be accessed by using address override prefixes to generate an offset to the correct 
address. For example, if the SMBASE has been relocated immediately below 16 Mbytes, the DS 
and ES registers are still initialized to 0000 OOOOH. We can still access data in SMRAM by using 
32-bit displacement registers: 


mov esi, OOFFxxxxH; 64K segment 
; immediately 
;below 16M 
mov ax,ds:[esi] 
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CHAPTER 9 
HARDWARE INTERFACE 


9.1 INTRODUCTION 


The Intel486™ processor has separate parallel buses for addresses and data. The bidirectional 
data bus is 32 bits wide. The address bus consists of two components: 30 address lines (A[31:2]) 
and 4-byte enable lines (BE[3:0]#). The address lines form the upper 30 bits of the address and 
the byte enables select individual bytes within a 4-byte location. The address lines are bidirec- 
tional for use in cache line invalidations (see Figure 9-1). 


The Intel486 processor’ s burst bus mechanism enables high-speed cache fills from external mem- 
ory. Burst cycles can strobe data into the processor at a rate of one item every clock. Non-burst 
cycles have a maximum rate of one item every two clocks. Burst cycles are not limited to cache 
fills: all read bus cycles requiring more than a single data cycle can be burst. 


During bus hold, the Intel486 processor relinquishes control of the local bus by floating its ad- 
dress, data, and control lines. The Intel486 processor has an address hold (AHOLD) feature in 
addition to bus hold. During address hold, only the address bus is floated; the data and control 
buses can remain active. Address hold is used for cache line invalidations. 


The Intel486 processor supports the IEEE 1149.1 boundary scan. 


This section provides a brief description of the Intel486 processor input and output signals ar- 
ranged by functional groups. The # symbol at the end of a signal name indicates that the active 
or asserted state occurs when the signal is at a low voltage. When # is not present after the signal 
name, the signal is active at a high voltage level. The term “ready” is used to indicate that the 
cycle is terminated with RDY# or BRDY#. 


This chapter and Chapter 10, “Bus Operation,” describe bus cycles and data cycles. A bus cycle 
is at least two-clocks long and begins with ADS# active in the first clock. and RDY# and/or 
BRDY# are active in the last clock. Data is transferred to or from the Intel486 processor during 
a data cycle. A bus cycle contains one or more data cycles. 


NOTE 
The ULP486 GX processor has a 16-bit data bus architecture. 
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9.2 SIGNAL DESCRIPTIONS 


9.2.1 Clock (CLK) 


CLK provides the fundamental timing and the internal operating frequency for the Intel486 pro- 
cessor. All external timing parameters are specified with respect to the rising edge of CLK. 


The Intel486 processor can operate over a wide frequency range; however the CLK frequency 
cannot change rapidly while RESET is inactive. The CLK frequency must be stable for proper 
chip operation because a single edge of CLK is used internally to generate two phases. CLK only 
needs TTL levels for proper operation. Figure 9-2 illustrates the CLK waveform. 


NOTE 
The ULP486 SX and ULP486 GX utilize an advanced DDL circuit that allows 
dynamic clock frequency changes. See the individual processor datasheets for 
further information. 


9.2.2 IntelDX4™ Processor Clock Multiplier Selectable Input (CLKMUL) 


The IntelDX4 processor differs from the InteIDX2™ processor in that it provides for two internal 
clock multiplier ratios: speed-doubled mode and speed-tripled mode. Speed-doubled mode is 
identical to the IntelDX2 processor mode of operation, in which the internal core is operating at 
twice the external bus frequency. Selecting speed-tripled mode causes the internal core frequency 
to operate at three times the external bus frequency. The IntelDX4 processor determines the de- 
sired clock multiplier ratio by sampling the status of the CLKMUL input during cold (power on) 
processor resets. 


NOTE 
The clock multiplier ratio cannot be changed during warm resets. Also, 
SRESET cannot be used to select the clock multiplier ratio. 


NOTE 


The speed-doubled IntelDX4 processor option is neither tested nor guaranteed. 
Designers that wish to implement the speed-doubled mode of operation should 
use the IntelDX2 processor. 
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Figure 9-1. Functional Signal Groupings 


9-3 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


—1.5V 


tx = input setup times : ; 
ty = input hold times, output float, valid and hold times 


Figure 9-2. CLK Waveform 


To determine which clock multiplier to use, the IntelDX4 processor samples the status of CLK- 
MUL while RESET is active. If CLKMUL is driven low during RESET, the frequency of the core 
will be twice the external bus frequency (speed-doubled mode). If driven high or left floating, 
speed-tripled mode is selected (see Table 9-1). In order to allow maximum flexibility, CLKMUL 


can be jumper-configurable to either Vcc to indicate speed-tripled mode or Vgg to indicate 
speed-doubled mode (see Figure 9-3). 


Table 9-1. Clock Multiplier Selection 


CLKMUL Clock External Clock Freq. Internal Clock Freq. 
at RESET Multiplier (MHz) (MHz) 

Voc! or 3 25 75 

Not Driven 33 100 

Vegi 2 50 100 


*This mode is neither tested nor supported. 
#This mode is tested and supported. 
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Figure 9-3. Voltage Detect (VOLDET) Sense Pin 


The clock multiplier selection method is fully backward compatible with Intel486 processor- 
based system designs. The CLKMUL signal occupies a pin that is labeled as an “INC” on other 
Intel486 processors. Therefore, this pin is not driven in other Intel486 processor system designs. 
The IntelDX4 processor contains an internal pull-up resistor on the CLKMUL signal. As shown 
in Table 9-2, when CLKMUL is not driven, the internal core frequency defaults to speed-tripled 
mode. 


The internal pull-up resistor on the CLKMUL pin is disabled while the IntelDX4 processor is in 
the Stop Grant or Stop Clock modes. On a system with CLKMUL connected to Vgg, this prevents 
a low level DC current path from drawing current while the processor is in the Stop Grant or Stop 
Clock states. 


9.2.3. Address Bus (A[31:2], BE[3:0]#) 


A[31:2] and BE[3:0]# form the address bus and provide physical memory and I/O port addresses. 
The Intel486 processor is capable of addressing 4 gigabytes of physical memory space 
(OOO00000H through FFFFFFFFH), and 64 Kbytes of I/O address space (OOOQ0000H through 
OOOOFFFFH). A[31:2] identify addresses to a 4-byte location. BE[3:0]# identify which bytes 
within the 4-byte location are involved in the current transfer. 


Addresses are driven back into the Intel486 processor over A[31:4] during cache line invalida- 
tions. The address lines are active high. When used as inputs into the processor, A[31:4] must 
meet the setup and hold times ty and t)3. A[31:2] are not driven during bus or address hold. 
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The byte enable outputs, BE[3:0]#, determine which bytes must be driven valid for read and write 
cycles to external memory. 


¢ BE3# applies to D[31:24] 
¢ BE2# applies to D[23:16] 
¢ BEl1# applies to D[15:8] 
¢ BEO# applies to D[7:0] 


BE[3:0]# can be decoded to generate AO, Al and BHE# signals used in 8- and 16-bit systems (see 
Table 10-5 in Chapter 10, “Bus Operation”). BE[3:0]# are active low and are not driven during 
bus hold. 


9.2.4 Data Lines (D[31:0]) 


The bidirectional lines D[31:0] form the data bus for the Intel486 processor. D[7:0] define the 
least significant byte and D[31:24] the most significant byte. Data transfers to 8- or 16-bit devices 
are enabled using the data bus sizing feature, which is controlled by the BS8# or BS 16# input 
signals. D[31:0] are active high. For reads, D[31:0] must meet the setup and hold times t,, and t,,. 
D[31:0] are not driven during read cycles and bus hold. 


9.2.5 Parity 


9.2.5.1 Data Parity Input/Outputs (DP[3:0]) 


DP[3:0] are the data parity pins for the processor. There is one pin for each byte of the data bus. 
Even parity is generated or checked by the parity generators/checkers. Even parity means that 
there are an even number of high inputs on the eight corresponding data bus pins and parity pin. 


Data parity is generated on all write data cycles with the same timing as the data driven by the 
Intel486 processor. Even parity information must be driven back to the Intel486 processor on 
these pins with the same timing as read information to ensure that the correct parity check status 
is indicated by the Intel486 processor. 


The values read on these pins do not affect program execution. It is the responsibility of the sys- 
tem to take appropriate actions if a parity error occurs. 


Input signals on DP[3:0] must meet setup and hold times t,, and t,, for proper operation. 


9.2.5.2 Parity Status Output (PCHK#) 


Parity status is driven on the PCHK# pin, and a parity error is indicated by this pin being low. For 
read operations, PCHK# is driven the clock after ready to indicate the parity status for the data 
sampled at the end of the previous clock. Parity is checked during code reads, memory reads and 
I/O reads. Parity is not checked during interrupt acknowledge cycles. PCHK# only checks the 
parity status for enabled bytes as indicated by the byte enable and bus size signals. It is valid only 
in the clock immediately after read data is returned to the Intel486 processor. At all other times, 
it is inactive (high). PCHK# is never floated. 
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Driving PCHK# is the only effect that bad input parity has on the Intel486 processor. The 
Intel486 processor does not vector to a bus error interrupt when bad data parity is returned. In 
systems that do not employ parity, PCHK# can be ignored. In systems not using parity, DP[3:0] 
should be connected to Vcc through a pull-up resistor. 


9.2.6 Bus Cycle Definition 


9.2.6.1 M/lO#, D/C#, W/R# Outputs 


M/IO#, D/C# and W/R# are the primary bus cycle definition signals. They are driven valid as the 
ADS# signal is asserted. M/IO# distinguishes between memory and I/O cycles, D/C# distinguish- 
es between data and control cycles and W/R# distinguishes between write and read cycles. 


Table 9-3 shows bus cycle definitions as a function of M/IO#, D/C# and W/R#. Note the differ- 
ence between the Intel486 processor and Intel386™ processor bus cycle definitions. The halt bus 
cycle type has been moved to location 001 in the Intel486 processor from location 101 in the 
Intel386 processor. Location 101 is now reserved. 


Special bus cycles are discussed in Section 10.3.11, “Special Bus Cycles.” 


Table 9-2. ADS# Initiated Bus Cycle Definitions 


M/lO# D/C# W/R# Bus Cycle Initiated 
0 0 0 Interrupt Acknowledge 
0 0 1 Halt/Special Cycle 
1 0 I/O Read 
a 
1 0 0 Code Read 
1 0 1 Reserved 
1 1 0 Memory Read 
1 1 1 Memory Write 


9.2.6.2 Bus Lock Output (LOCK#) 


LOCK# indicates that the Intel486 processor is running a read-modify-write cycle in which the 
external bus must not be relinquished between the read and write cycles. Read-modify-write cy- 
cles are used to implement memory-based semaphores. Multiple reads or writes can be locked. 


When LOCK# is asserted, the current bus cycle is locked and the Intel486 processor should be 
allowed exclusive access to the system bus. LOCK# goes active in the first clock of the first 
locked bus cycle and goes inactive after ready is returned indicating the last locked bus cycle. 


The Intel486 processor does not acknowledge bus hold when LOCK# is asserted (although it does 
allow an address hold). LOCK# is active low and is floated during bus hold. Locked read cycles 
are not transformed into cache fill cycles if KEN# is returned active. Refer to Section 10.3.7, 
“Pseudo-Locked Cycles,” for a detailed discussion of locked bus cycles. 
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9.2.6.3 Pseudo-Lock Output (PLOCK#) 


The pseudo-lock feature allows atomic reads and writes of memory operands greater than 32 bits. 
These operands require more than one cycle to transfer. The Intel486 processor asserts PLOCK# 
during segment table descriptor reads (64 bits) and cache line fills (128 bits). 


When PLOCK# is asserted, no other master is given control of the bus between cycles. A bus hold 
request (HOLD) is not acknowledged during pseudo-locked reads and writes, with one exception. 
During non-cacheable non-burst code prefetches, HOLD is recognized on memory cycle bound- 
aries even though PLOCK# is asserted. The Intel486 processor drives PLOCK# active until the 
addresses for the last bus cycle of the transaction have been driven, regardless of whether BRDY# 
or RDY# are returned. 


A pseudo-locked transfer is meaningful only if the memory operand is aligned and if its com- 
pletely contained within a single cache line. 


Because PLOCK# is a function of the bus size and KEN# inputs, PLOCK# should be sampled 
only in the clock ready is returned. PLOCK# is active low and is not driven during bus hold (see 
Section 10.3.7, “Pseudo-Locked Cycles”). 


9.2.6.4 PLOCK# Floating-Point Considerations 


For processors with an on-chip FPU, the following must be noted for PLOCK# operation. A 64- 
bit floating-point number must be aligned to an 8-byte boundary to guarantee an atomic access. 
Normally, PLOCK# and BLAST# are inverses of each other. However, during the first cycle of 
a 64-bit floating-point write, both PLOCK# and BLAST# are asserted. Intel486 processors with 
on-chip FPUs also assert PLOCK# during floating-point long reads and writes (64 bits), segment- 
able description reads (64 bits), and code line fills (128 bits). 


9.2.7. Bus Control 


The bus control signals allow the Intel486 processor to indicate when a bus cycle has begun, and 
allow other system hardware to control burst cycles, data bus width, and bus cycle termination. 


9.2.7.1 Address Status Output (ADS#) 


The ADS# output indicates that the address and bus cycle definition signals are valid. This signal 
goes active in the first clock of a bus cycle and goes inactive in the second and subsequent clocks 
of the cycle. ADS# is also inactive when the bus is idle. 


ADS# is used by the external bus circuitry as the indication that the Intel486 processor has started 
a bus cycle. The external circuit must sample the bus cycle definition pins on the next rising edge 
of the clock after ADS# is driven active. 


ADS# is active low and is not driven during bus hold. 
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9.2.7.2 Non-Burst Ready Input (RDY#) 


RDY# indicates that the current bus cycle is complete. In response to a read, RDY# indicates that 
the external system has presented valid data on the data pins. In response to a write request, 
RDY# indicates that the external system has accepted the Intel486 processor data. RDY# is ig- 
nored when the bus is idle and at the end of the first clock of the bus cycle. Because RDY# is 
sampled during address hold, data can be returned to the processor when AHOLD is active. 


RDY# is active low, and is not provided with an internal pull-up resistor. This input must satisfy 
setup and hold times t,, and t,, for proper chip operation. 


9.2.8 Burst Control 


9.2.8.1 Burst Ready Input (BRDY#) 


BRD Y# performs the same function during a burst cycle that RDY# performs during a non-burst 
cycle. BRD Y# indicates that the external system has presented valid data on the data pins in re- 
sponse to a read or that the external system has accepted the Intel486 processor data in response 
to a write. BRDY# is ignored when the bus is idle and at the end of the first clock in a bus cycle. 


During a burst cycle, BRD Y# is sampled each clock. If it is active, the data presented on the data 
bus pins is strobed into the Intel486 processor. ADS# is negated during the second through last 
data cycles in the burst, but address lines A[3:2] and the byte enables change to reflect the next 
data item expected by the Intel486 processor. 


If RDY# is returned simultaneously with BRDY#, BRD Y# is ignored and the burst cycle is pre- 
maturely aborted. An additional complete bus cycle is initiated after an aborted burst cycle if the 
cache line fill was not complete. BRDY# is treated as a normal ready for the last data cycle in a 
burst transfer or for non-burstable cycles (see Section 10.3.2, “Multiple and Burst Cycle Bus 
Transfers” for burst cycle timing). 


BRDY# is active low and is provided with a small internal pull-up resistor. BRD Y# must satisfy 
the setup and hold times t,, and t,,. 


9.2.8.2 Burst Last Output (BLAST#) 


BLAST# indicates that the next time BRDY# is returned it will be treated as a normal RDY#, 
terminating the line fill or other multiple-data-cycle transfer. BLAST# is active for all bus cycles 
regardless of whether they are cacheable or not. This pin is active low and is not driven during 
bus hold. 
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9.2.9 Interrupt Signals 


The interrupt signals can interrupt or suspend execution of the processor’s instruction stream. 


9.2.9.1 Reset Input (RESET) 


The RESET input must be used at power-up to initialize the processor. RESET forces the proces- 
sor to begin execution at a known state. The processor cannot begin execution of instructions until 
at least 1 ms after Vcc and CLK reach their proper DC and AC specifications. The RESET pin 
should remain active during this time to ensure proper processor operation. However, for warm 
boot-ups RESET should remain active for at least 15 CLK periods. RESET is active high. RESET 
is asynchronous but must meet setup and hold times t,,) and t,, for recognition in any specific 
clock. 


RESET returns SMBASE to the default value of 30000H. If SMBASE relocation is not used, RE- 
SET can be used as the only reset (see Chapter 8, “System Management Mode (SMM) Architec- 
tures’). 


The Intel486 processor is placed in the Power Down Mode if RESERVED# is sampled active at 
the falling edge of RESET. 


9.2.9.2 Soft Reset Input (SRESET) 


The SRESET (soft reset) input has the same functions as RESET, but does not change the SM- 
BASE, and RESERVED# is not sampled on the falling edge of SRESET. If the system uses SM- 
BASE relocation, the soft resets should be handled using the SRESET input. SRESET should not 
be used for the cold boot-up power-on reset. 


The SRESET input pin is provided to save the status of SMBASE during an Intel 80286 proces- 
sor-compatible mode change. SRESET leaves the location of SMBASE intact while resetting 
other units, including the on-chip cache. See Section 9.2.18.4, “Soft Reset (SRESET)” on page 
9-20 for Write-Back Enhanced Intel486 processor differences. For compatibility, the system 
should use SRESET to flush the on-chip cache. The FLUSH# input pin should be used to flush 
the on-chip cache. SRESET should not be used to initiate test modes. 


9.2.9.3 System Management Interrupt Request Input (SMI#) 


SMI# is the system management mode interrupt request signal. The SMI# request is acknowl- 
edged by the SMIACT# signal. After the SMI# interrupt is recognized, the SMI# signal is masked 
internally until the RSM instruction is executed and the interrupt service routine is complete. 
SMI# is falling-edge sensitive after internal synchronization. 


The SMI# input must be held inactive for at least four clocks after it is asserted to reset the edge 
triggered logic. SMI# is provided with a pull-up resistor to maintain compatibility with designs 
that do not use this feature. SMI# is an asynchronous signal, but setup and hold times t,, and t,, 
must be met in order to guarantee recognition on a specific clock. 
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9.2.9.4 System Management Mode Active Output (SMIACT#) 


SMIACT# indicates that the processor is operating in System Management Mode. The processor 
asserts SMIACT# in response to an SMI interrupt request on the SMI# pin. SMIACT# is driven 
active after the processor has completed all pending write cycles (including emptying the write 
buffers), and before the first access to SMRAM, in which the processor saves (writes) its state (or 
context) to SMRAM. SMIACT# remains active until the last access to SMRAM when the pro- 
cessor restores (reads) its state from SMRAM. The SMIACT# signal does not float in response 
to HOLD. The SMIACT# signal is used by the system logic to decode SMRAM. 


9.2.9.5 Maskable Interrupt Request Input (INTR) 


INTR indicates that an external interrupt has been generated. Interrupt processing is initiated 
when the IF flag is active in the EFLAGS register. 


The Intel486 processor generates two locked interrupt acknowledge bus cycles in response to as- 
serting the INTR pin. An 8-bit interrupt number is latched from an external interrupt controller at 
the end of the second interrupt acknowledge cycle. INTR must remain active until the interrupt 
acknowledges have been performed to assure program interruption. Refer to Section 10.3.10, “In- 
terrupt Acknowledge,” for a detailed discussion of interrupt acknowledge cycles. 


The INTR pin is active high and is not provided with an internal pull-down resistor. INTR is asyn- 
chronous, but the INTR setup and hold times ty9 and ty; must be met to assure recognition on any 
specific clock. 


9.2.9.6 Non-maskable Interrupt Request Input (NMI) 


NMI is the non-maskable interrupt request signal. Asserting NMI causes an interrupt with an in- 
ternally supplied vector value of 2. External interrupt acknowledge cycles are not generated be- 
cause the NMI interrupt vector is internally generated. When NMI processing begins, the NMI 
signal is masked internally until the IRET instruction is executed. 


NM is rising edge sensitive after internal synchronization. NMI must be held low for at least four 
CLK periods before this rising edge for proper operation. NMI is not provided with an internal 
pull-down resistor. NMI is asynchronous but setup and hold times, ty and t,; must be met to as- 
sure recognition on any specific clock. 


9.2.9.7 Stop Clock Interrupt Request Input (STPCLK#) 


The Intel486 processor provides an interrupt mechanism, STPCLK#, that allows system hard- 
ware to control the processor’ s power consumption. The STPCLK# signal can be asserted to stop 
the internal clock (output of the PLL) to the processor core in a controlled manner. This low-pow- 
er state is called the Stop Grant state. In addition, the STPCLK# interrupt allows the system to 
change the input frequency within the specified range or completely stop the CLK input frequen- 
cy (input to the PLL). If the CLK input is completely stopped, the processor enters into the Stop 
Clock state—the lowest power state. If the frequency is changed or stopped, the Intel486 proces- 
sor does not return to the Stop Grant state until the CLK input has been running at a constant fre- 
quency for the time period necessary to stabilize the PLL (minimum of | ms). 
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The Intel486 processor generates a Stop Grant bus cycle in response to the STPCLK# interrupt 
request. STPCLK# is active low and is provided with an internal pull-up resistor. STPCLK# is 
an asynchronous signal, but must remain active until the processor issues the Stop Grant bus cycle 
(see Section 10.3.11.3, “Stop Grant Indication Cycle”). 


9.2.10 Bus Arbitration Signals 


This section describes the mechanism by which the processor relinquishes control of its local bus 
when the local bus is requested by another bus master. 


9.2.10.1 Bus Request Output (BREQ) 


The Intel486 processor asserts BREQ when a bus cycle is pending internally. Thus, BREQ is al- 
ways asserted in the first clock of a bus cycle, along with ADS#. If the Intel486 processor cur- 
rently is not driving the bus (due to HOLD, AHOLD, or BOFF#), BREQ is asserted in the same 
clock that ADS# would have been asserted if the Intel486 processor were driving the bus. After 
the first clock of the bus cycle, BREQ may change state. It is asserted if additional cycles are nec- 
essary to complete a transfer (via BS8#, BS16#, KEN#), or if more cycles are pending internally. 
However, if no additional cycles are necessary to complete the current transfer, BREQ can be ne- 
gated before ready comes back for the current cycle. External logic can use the BREQ signal to 
arbitrate among multiple processors. This pin is driven regardless of the state of bus hold or ad- 
dress hold. BREQ is active high and is never floated. During a hold state, internal events may 
cause BREQ to be de-asserted prior to any bus cycles. 


9.2.10.2 | Bus Hold Request Input (HOLD) 


HOLD allows another bus master complete control of the Intel486 processor bus. The Intel486 
processor responds to an active HOLD signal by asserting HLDA and placing most of its output 
and input/output pins in a high impedance state (floated) after completing its current bus cycle, 
burst cycle, or sequence of locked cycles. In addition, if the Intel486 processor receives a HOLD 
request while performing a code fetch, and that cycle is backed off (BOFF#), the Intel486 pro- 
cessor will recognize HOLD before restarting the cycle. The code fetch can be non-cacheable or 
cacheable and non-burst or burst. The BREQ, HLDA, PCHK# and FERR# pins are not floated 
during bus hold. The Intel486 processor maintains its bus in this state until the HOLD is de-as- 
serted. Refer to Section 10.3.9, “Bus Hold,” for timing diagrams for bus hold cycles and HOLD 
request acknowledge during BOFF#. 


Unlike the Intel386 processor, the Intel486 processor recognizes HOLD during reset. Pull-up re- 
sistors are not provided for the outputs that are floated in response to HOLD. HOLD is active high 
and is not provided with an internal pull-down resistor. HOLD must satisfy setup and hold times 
t,, and t,, for proper chip operation. 


9-12 | 


intel. HARDWARE INTERFACE 


9.2.10.3_ Bus Hold Acknowledge Output (HLDA) 


HLDA indicates that the Intel486 processor has given the bus to another local bus master. HLDA 
goes active in response to a hold request presented on the HOLD pin. HLDA is driven active in 
the same clock in which the Intel486 processor floats its bus. 


HLDA is driven inactive when leaving bus hold, and the Intel486 processor resumes driving the 
bus. The Intel486 processor does not cease internal activity during bus hold because the internal 
cache satisfies the majority of bus requests. HLDA is active high and remains driven during bus 
hold. 


9.2.10.4 Backoff Input (BOFF#) 


Asserting the BOFF# input forces the Intel486 processor to release control of its bus in the next 
clock. The pins floated are exactly the same as those floated in response to HOLD. The response 
to BOFF# differs from the response to HOLD in two ways: First, the bus is floated immediately 
in response to BOFF#, whereas the Intel486 processor completes the current bus cycle before 
floating its bus in response to HOLD. Second the Intel486 processor does not assert HLDA in 
response to BOFF#. 


The Intel486 processor remains in bus hold until BOFF# is negated. Upon negation, the Intel486 
processor restarts the bus cycle that was aborted when BOFF# was asserted. To the internal exe- 
cution engine the effect of BOFF# is the same as inserting a few wait states to the original cycle. 
Refer to Section 10.3.12, “Bus Cycle Restart,” for a description of bus cycle restart. 


Any data returned to the Intel486 processor while BOFF# is asserted is ignored. BOFF# has high- 
er priority than RDY# or BRDY#. If both BOFF# and ready are returned in the same clock, BO- 
FF# takes effect. If BOFF# is asserted while the bus is idle, the Intel486 processor floats its bus 
in the next clock. BOFF# is active low and must meet setup and hold times t,, and t,, for proper 
chip operation. 


9.2.11 Cache Invalidation 


The AHOLD and EADS# inputs are used during cache invalidation cycles. AHOLD conditions 
the Intel486 processor address lines, A[31:4], to accept an address input. EADS# indicates that 
an external address is actually valid on the address inputs. Activating EADS# causes the Intel486 
processor to read the external address bus and perform an internal cache invalidation cycle to the 
address indicated. Refer to Section 10.3.8, “Invalidate Cycles,” for cache invalidation cycle tim- 
ing. 
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9.2.11.1 Address Hold Request Input (AHOLD) 


AHOLD is the address hold request. It allows another bus master access to the Intel486 processor 
address bus for performing an internal cache invalidation cycle. Asserting AHOLD forces the 
Intel486 processor to stop driving its address bus in the next clock. While AHOLD is active only 
the address bus is floated, the remainder of the bus can remain active. For example, data can be 
returned for a previously specified bus cycle when AHOLD is active. The Intel486 processor 
does not initiate another bus cycle during address hold. Because the Intel486 processor floats its 
bus immediately in response to AHOLD, an address hold acknowledge is not required. If 
AHOLD is asserted while a bus cycle is in progress and no readies are returned during the time 
AHOLD is asserted, the Intel486 processor re-drives the same address (that it originally sent out) 
once AHOLD is negated. 


AHOLD is recognized during reset. Because the entire cache is invalidated by reset, any invali- 
dation cycles run during reset is unnecessary. AHOLD is active high and is provided with a small 
internal pull-down resistor. It must satisfy the setup and hold times t,, and t,, for proper chip op- 
eration. AHOLD also determines whether or not the built-in self-test features of the Intel486 pro- 
cessor are exercised on assertion of RESET. For more information on these features, see “Built- 
In Self Test (BIST)” in Appendix B. 


9.2.11.2 External Address Valid Input (EADS#) 


EADS¢# indicates that a valid external address has been driven onto the Intel486 processor address 
pins. This address is used to perform an internal cache invalidation cycle. The external address is 
checked with the current cache contents. If the specified address matches an area in the cache, 
that area is immediately invalidated. 


An invalidation cycle can be run by asserting EADS# regardless of the state of AHOLD, HOLD 
and BOFF#. EADS# is active low and is provided with an internal pull-up resistor. EADS# must 
satisfy the setup and hold times t,, and t,, for proper chip operation. 


9.2.12 Cache Control 


9.2.12.1 Cache Enable Input (KEN#) 


KEN# is the cache enable pin. KEN# is used to determine whether the data being returned by the 
current cycle is cacheable. When KEN# is active and the Intel486 processor generates a cycle that 
can be cached (most read cycles), the cycle is transformed into a cache line fill cycle. 


A cache line is 16 bytes long. During the first cycle of a cache line fill, the byte-enable pins should 
be ignored and data should be returned as if all four byte enables were asserted. The Intel486 pro- 
cessor runs between 4 and 16 contiguous bus cycles to fill the line depending on the bus data 
width selected by BS8# and BS16#. Refer to Section 10.3.3, “Cacheable Cycles,” for a descrip- 
tion of cache line fill cycles. 


The KEN# input is active low and is provided with a small internal pull-up resistor. It must satisfy 
the setup and hold times t,, and t,; for proper chip operation. 
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9.2.12.2 Cache Flush Input (FLUSH#) 


The FLUSH# input forces the Intel486 processor to flush its entire internal cache. FLUSH# is 
active low and must be asserted for one clock only. FLUSH# is asynchronous but setup and hold 
times t,) and t,, must be met for recognition on any specific clock. 


FLUSH# also determines whether or not the three-state test mode of the Intel486 processor is in- 
voked on assertion of RESET (see “Three-State Output Test Mode” in Appendix B). 


9.2.13 Page Cacheability (PWT, PCD) 


The PWT and PCD output signals correspond to two user attribute bits in the page table entry. 
When paging is enabled, PWT and PCD correspond to bits 3 and 4 of the page table entry, re- 
spectively. For cycles that are not paged when paging is enabled (for example I/O cycles) PWT 
and PCD correspond to bits 3 and 4 in Control Register 3. When paging is disabled, the Intel486 
processor ignores the PCD and PWT bits and assumes they are zero for the purpose of caching 
and driving PCD and PWT. 


PCD is masked by the CD (cache disable) bit in Control Register 0 (CRO). When CD=1 (cache 
line fills disabled) the Intel486 processor forces PCD high. When CD=0, PCD is driven with the 
value of the page table entry/directory. 


The purpose of PCD is to provide a cacheable/non-cacheable indication on a page by page basis. 
The Intel486 processor does not perform a cache fill to any page in which bit 4 of the page table 
entry is set. PWT corresponds to the write-back bit and can be used by an external cache to pro- 
vide this functionality. PCD and PWT bits are assigned a value of zero during Real Mode and 
when paging is disabled. Refer to Section 7.6, “Page Cacheability,” for a discussion of non- 
cacheable pages. 


PCD and PWT have the same timing as the cycle definition pins (M/IO#, D/C#, W/R#). PCD and 
PWT are active high and are not driven during bus hold. 


NOTE 
The PWT and PCD bits function differently in the write-back mode of the 
Write-Back Enhanced Intel486 processors (see Section 7.6.1, “Write-Back 
Enhanced IntelDX4™ Processor and Processor Page Cacheability”’). 


9.2.14 RESERVED# 


The RESERVED# input detects the presence of an in-circuit emulator, then powers down the 
core, and three-states all outputs of the original processor, so that the original processor consumes 
very low current. This state is known as Reserved Power Down Mode. RESERVED# is active 
low and sampled at all times, including after power-up and during reset. 


9.2.15 Numeric Error Reporting (FERR#, IGNNE#) 


To allow PC-type floating-point error reporting, IntelDX2 and IntelDX4 processors provide two 
pins, FERR# and IGNNE#. 
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9.2.15.1 Floating-Point Error Output (FERR#) 


The processor asserts FERR# when an unmasked floating-point error is encountered. FERR# is 
similar to the ERROR# pin on the Intel387 math coprocessor. FERR# can be used by external 
logic for PC-type floating-point error reporting in systems with an IntelDX2 or IntelDX4 proces- 
sor. FERR# is active low and is not floated during bus hold. 


In some cases, FERR# is asserted when the next floating-point instruction is encountered. In oth- 
er cases, it is asserted before the next floating-point instruction is encountered, depending on the 
execution state of the instruction that caused the exception. 


The following class of floating-point exceptions assert FERR# at the time the exception occurs 
(i.e., before encountering the next floating-point instruction): 


1. The stack fault, invalid operation, and denormal exceptions on all transcendental 
instructions, integer arithmetic instructions, FSQRT, FSCALE, FPREM(1), FXTRACT, 
FBLD, and FBSTP. 


2. Any exceptions on store instructions (including integer store instructions). 


The following class of floating-point exceptions assert FERR# only after encountering the next 
floating-point instruction: 


1. Exceptions other than on all transcendental instructions, integer arithmetic instructions, 
FSQRT, FSCALE, FPREM(1), FXTRACT, FBLD, and FBSTP. 


2. Any exception on all basic arithmetic, load, compare, and control instructions (i.e., all 
other instructions). 


In the event of a pending unmasked floating-point exception the FNINIT, FNCLEX, FNSTENV, 
FNSAVE, FNSTSW and FNSTCW instructions assert the FERR# pin. Shortly after the assertion 
of the pin, an interrupt window is opened during which the processor samples and services inter- 
rupts, if any. If no interrupts are sampled within this window, the processor then executes these 
instructions with the pending unmasked exception. However, for the FNCLEX, FNINIT, FN- 
STENV and FNSAVE instructions, the FERR# pin is de-asserted to enable the execution of these 
instructions. For details, please refer to the Intel486™ Processor Family Programmer's Refer- 
ence Manual. 


9.2.15.2 Ignore Numeric Error Input (IGNNE#) 


When IGNNE# is asserted and FERR# is still activated, IntelDX2 and IntelDX4 processors ig- 
nore numeric errors and continue executing non-control floating-point instructions. When 
IGNNE# is not asserted and a pending unmasked numeric exception exists (SW.ES=1), the 
Intel486 processor behaves as follows: 


When the Intel486 processor encounters the floating-point instructions FNINIT, FNCLEX, FN- 
STENV, FNSAVE, FNSTSW or FNSTCW, the processor asserts the FERR# pin. Subsequently, 
the processor opens an interrupt sampling window. The interrupts are checked and serviced dur- 
ing this window. If no interrupts are sampled within this window the processor then executes 
these instructions in spite of the pending unmasked exception. For further details, please refer to 
the Intel486™ Processor Family Programmer's Reference Manual. 
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When the Intel486 processor encounters any floating-point instruction other than FNINIT, FN- 
CLEX, FNSTENV, FNSAVE, FNSTSW or FNSTCW, the processor stops execution, asserts the 
FERR# pin, and waits for an external interrupt. 


IGNNE# has no effect when the NE bit in control register 0 is set. 


The IGNNE# input is active low and provided with a small internal pull-up resistor. This input is 
asynchronous, but must meet setup and hold times ty) and t,, to ensure recognition on any specific 
clock. 


NOTE 
The IGNNE# and FERR# pins doe not exist on the Intel486 SX processors. 


9.2.16 Bus Size Control (BS16#, BS8#) 


The BS16# and BS8# inputs allow external 16- and 8-bit buses to be supported with a small num- 
ber of external components. The Intel486 processor samples these pins every clock. The bus size 
is determined by the value sampled in the clock before ready. When asserting BS16# or BS8#, 
only 16 or 8 bits of the data bus must be valid. If both BS16# and BS8# are asserted, an 8-bit bus 
width is selected. 


When BS 16# or BS8# are asserted, the Intel486 processor converts a larger data request to the 
appropriate number of smaller transfers. The byte enables are also modified appropriately for the 
bus size selected. 


BS 16# and BS 8# are active low and are provided with small internal pull-up resistors. BS 16# and 
BS8# must satisfy the setup and hold times t,, and t,; for proper chip operation. 


NOTE 
The BS8# and BS 16# are not available on the Ultra-Low Power Intel486 GX 
processor. 


9.2.17 Address Bit 20 Mask (A20M#) 


Asserting the A20M# input causes the Intel486 processor to mask physical address bit 20 before 
performing a lookup in the internal cache and before driving a memory cycle to the outside world. 
When A20M# is asserted, the Intel486 processor emulates the 1-Mbyte address wraparound that 
occurs on the 8086. A20M# is active low and must be asserted only when the processor is in Real 
Mode. A20M# is not defined in Protected Mode. A20M# is asynchronous but should meet setup 
and hold times ty9 and t,, for recognition in any specific clock. For correct operation of the chip, 
A20M# should not be active at the falling edge of RESET. 
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A20M# exhibits a minimum 4 clock latency, from time of assertion to masking of the A20 bit. 
A20M# is ignored during cache invalidation cycles. I/O writes require A20M# to be asserted a 
minimum of 2 clocks prior to RDY being returned for the I/O write. This ensures recognition of 
the address mask before the Intel486 processor begins executing the instruction following OUT. 
If A20M# is asserted after the ADS# of a data cycle, the A20 address signal is not masked during 
this cycle but is masked in the next cycle. During a prefetch (cacheable or not), if A20M# is as- 
serted after the first ADS#, A20 is not masked for the duration of the prefetch even if BS16# or 
BS8# is asserted. 


9.2.18 Write-Back Enhanced IntelDX4™ Processor Signals and Other Enhanced 
Bus Features 


This section describes the pins that interface with the system to support the Enhanced Bus 
mode/write-back cache features at system level. 


9.2.18.1 Cacheability (CACHE#) 


The CACHE# output indicates the internal cacheability on read cycles and a burst write-back on 
write cycles. CACHE# is asserted for cacheable reads, cacheable code fetches and write-backs. 
It is driven inactive for non-cacheable reads, special cycles, I/O cycles and write-through cycles. 
This is different from the PCD (page cache disable) pin. The operational differences between 
CACHE# and PCD are listed in Table 9-3. See Table 9-4 for operational differences between 
CACHE# and other Intel486 processor signals. 


Table 9-3. Differences between CACHE# and PCD 


Bus Operation CACHE# PCD 
All reads () same as PCD®) same as 
PcbD®) 

Replacement write-back low low 

Snoop-forced write-back low low 
S-state write-through high same as 
PcbD®) 
|-state write-through (2) high same as 
PCD‘) 

NOTES: 


1. Includes line fills and non-cacheable reads. During locked read cycles 
CACHE is inactive. The non-cacheable reads may or may not be burst. 

2. Due to the non-allocate on write policy, this includes both cacheable and 
non-cacheable writes. PCD distinguishes between the two, but CACHE# 
does not. 

3. This behavior is the same as the existing specification of the Intel486™ 
processor in write-through mode. 
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Table 9-4. CACHE# vs. Other Intel486™ Processor Signals 


Pin Symbol Relation To This Signal 


ADS# CACHE is driven to valid state with ADS#. 
RDY#, BRDY# CACHE+# is de-asserted with the first RDY# or BRDY#. 
HLDA, BOFF# CACHE floats under these signals. 


KEN# The combination of CACHE# and KEN# determines if a read miss is converted into a 
cache line fill. 


9.2.18.2 Cache Flush (FLUSH#) 


FLUSH# is an existing pin that operates differently if the processor is configured for Enhanced 
Bus mode (write-back) operation. In Enhanced Bus mode, FLUSH# is treated as an interrupt and 
acts similarly to the WBINVD instruction. It is sampled at each clock, but is recognized only on 
an instruction boundary. Pending writes are completed before FLUSH# is serviced, and all 
prefetching is stopped. Depending on the number of modified lines in the cache, the flush could 
take up to a minimum of 1280 bus clocks or 2560 processor clocks and a maximum of 5000+ bus 
clocks to scan the cache, perform the write backs, invalidate the cache and run two special cycles. 
After all modified lines are written back to memory, two special bus cycles, the first flush ACK 
cycle and the second flush ACK cycle, are issued, in that order. These cycles differ from the spe- 
cial cycles issued after WBINVD only in that address line 2 = 1. SRESET, STPCLK#, INTR, 
NMI and SMI# are not recognized during a flush write-back, whereas BOFF#, AHOLD and 
HOLD are recognized. 


FLUSH# may be asserted just for a single clock or may be retained asserted, but should be de- 
asserted at or prior to the RDY# returned from the first flush ACK special bus cycle. If asserted 
during INVD or WBINVD, FLUSH# is recognized. If asserted simultaneously with SMI#, then 
SMI# is recognized after FLUSH# is serviced. 


FLUSH# may be driven at any time. If driven during SRESET, it must be held for one clock after 
SRESET is de-asserted to be recognized. 


9.2.18.3  Hit/Miss to a Modified Line (HITM#) 


HITM# is a cache coherency protocol pin that is driven only in Enhanced Bus mode. When a 
snoop cycle is generated (with INV = 0 or INV = 1), HITM# indicates whether the processor con- 
tains the snooped line in the M-state. HITM# asserted indicates that the line will be written back 
in total, unless the processor is already generating a replacement write-back of the same line. 


HITM# is valid on the bus two system clocks after EADS# is asserted on the bus. If asserted, 
HITM# remains asserted until the last RDY# or BRDY# of the snoop write-back cycle is re- 
turned. It is de-asserted before the next ADS# (see Table 9-5). 
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Table 9-5. HITM# vs. Other Intel486™ Processor Signals 


Pin Symbol Relation To This Signal 


EADS# HITM# is asserted due to an EADS#-driven snoop, provided the snooped line is in the 
M-state in the cache. 


HLDA, BOFF# HITM# does not float under these signals. 


ADS#, CACHE# _ | The beginning of a snoop write-back cycle is identified by the assertion of ADS#, 
CACHE#, and HITM#. 


9.2.18.4 Soft Reset (SRESET) 


When in Enhanced Bus mode, SRESET has the following differences: SRESET, unlike RESET, 
does not cause the AHOLD, A20M#, FLUSH#, RESERVED#, and WB/WT# pins to be sampled 
(i.e., special test modes and on-chip cache configuration can not be accessed with SRESET.) 


On SRESET, the internal SMRAM base register retains its previous value and the processor does 
not flush, write-back or disable the internal cache. CRO.CD and CRO.NW retain previous values, 
CRO.4 is set to 1, and the remaining bits are cleared. Because SRESET is treated as an interrupt, 
it is possible to have a bus cycle while SRESET is asserted. A bus cycle could be due to an on- 
going instruction, emptying the write buffers of the processor, or snoop write-back cycles if there 
is a snoop hit to an M-state line while SRESET is asserted. 


NOTES 
For both Standard Bus mode and Enhanced Bus mode: 


SMI# must be blocked during SRESET. It must also be blocked for a 
minimum of two clocks after SRESET is de-asserted. 


SRESET must be blocked during SMI#. It must also be blocked for a 
minimum of 20 clocks after SMIACT# is de-asserted. 


9.2.18.5 —_Invalidation Request (INV) 


INV is a cache coherency protocol pin that is used only in Enhanced Bus mode. It is sampled by 
the processor on EADS#-driven snoop cycles. It is necessary to assert this pin to simulate the 
Standard mode processor invalidate cycle on write-through-only lines. INV also invalidates the 
write-back lines. However, when the snooped line is in the M-state, the line is written back and 
then invalidated. 


INV is sampled when EADS# is asserted. When INV is not asserted with EADS#, the snoop cycle 
has no effect on a write-through-only line or on a line allocated as write-back but not yet modi- 
fied. If the line is write-back and modified, it is written back to memory but is not de-allocated 
(invalidated) from the internal cache. The address of the snooped cache line is provided on the 
address bus (see in Table 9-6). 
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Table 9-6. INV vs. Other Intel486™ Processor Signals 


Pin Symbol Relation To This Signal 
EADS# EADS# determines when INV is sampled. 
A[31:4] The address of the snooped cache line is provided on these pins. 


9.2.18.6 Write-Back/Write-Through (WB/WT#) 


WB/WT# enables Enhanced Bus mode (write-back cache). It also allows the system to define a 
cached line as write-through or write-back. 


WB/WT# is sampled at the falling edge of RESET to determine if Enhanced Bus mode is enabled 
(WB/WT# must be driven for two clocks before and two clocks after RESET to be recognized by 
the processor). If sampled low or floated, the Write-Back Enhanced IntelDX4 processor operates 
in the Intel486 processor Standard mode. For write-through only operation, (i.e. Standard mode), 
WB/WT# does not need to be connected. 


In Enhanced Bus mode, WB/WT# allows the system hardware to force any allocated line to be 
treated as write-through or write-back. As with cacheability, both the processor and the external 
system must agree that a line may be treated as write-back for the internal cache to be allocated 
as write-back. The default is always write-through. The processor’s indication of write-back vs. 
write-through is from the PWT pin, in which function and timing are the same as in the Standard 
mode of the Intel486 processor. 


To define write-back or write-through configuration of a line, WB/WT# is sampled in the same 
clock in which the first RDY# or BRDY# is returned during a line fill (allocation) cycle (see Ta- 
ble 9-7). 


Table 9-7. WB/WT# vs. Other Intel486™ Processor Signals 


Pin Symbol Relation to This Signal 


RDY#, BRDY# WB/WT# is sampled with the first RDY# or BRDY#. 


PWT The combination of WB/WT# and PWT determine whether the Write-Back Enhanced 
IntelDX4™ processor treats the line as WB. 


PCD, CACHE#, The state of WB/WT# does not matter if PCD, CACHE# or KEN# define the line to be 
KEN# non-cacheable. 


W/R# WB/WT# is significant only on read fill cycles. 
RESET WB/WT# is sampled on the falling edge of RESET to define the cache configuration. 
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9.2.18.7  Pseudo-Lock Output (PLOCK#) 


In the Enhanced Bus mode, PLOCK# is always driven inactive. In this mode, a 64-bit data read 
(caused by an FP operand access or a segment descriptor read) is treated as a multiple cycle read 
request, which may be a burst or a non-burst access based on whether BRD Y# or RDY# is re- 
turned by the system. Because only write-back cycles (caused by snoop write-back or replace- 
ment write-back) are burstable, a 64-bit write is driven out as two non-burst bus cycles. BLAST# 
is asserted during both writes. Refer to Section 10.3, “Bus Functional Description,” for details on 
pseudo-locked bus cycles. 


9.2.19 IntelDX4™ Processor Voltage Detect Sense Output (VOLDET) 


A voltage detect sense pin (VOLDET) has been added to the IntelDX4 processor PGA package. 
This pin allows external system logic to distinguish between a5 V IntelDX2™ processor and the 
3.3 V IntelDX4 processor. The pin passively indicates to external logic whether the installed 
PGA processor requires 5 V (in the case of the IntelDX2 processor) or 3.3 V (in the case of the 
IntelD X4 processor). Pin S4 has been defined as the VOLDET pin because this pin is defined as 
an INC pin on the IntelDX2 processor. This pin is provided in the PGA package only. 


To use this feature, a weak external pull-up resistor should be connected to the VOLDET pin. 
This pin samples high (logic 1) if the installed processor is a 5 V IntelDX2 processor. This pin 
samples low (logic 0) if a IntelDX4 processor is installed. Upon sampling the logic level of this 
pin, external logic can enable the proper Vcc level to the processor. In power sensitive applica- 
tions, an active element is preferred for the pull-up device because it can be disabled after sam- 
pling, eliminating the DC current path that results when the installed processor is the IntelDX4 
processor. 


Figure 9-4 shows a logical representation of the voltage detect sense mechanism. 


+5V 
10 KQ 
(or weak 
active pullup) 
Ext | 
VOLDETsense pin Sampling 
| Logic 


Processor Vcc Enable 


Figure 9-4. Voltage Detect (VOLDET) Sense Pin 


This pin can remain a no connect for those system designs that do not wish to utilize this voltage 
detect feature. 
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9.2.20 Boundary Scan Test Signalst 


The following boundary scan test signals are available on all Intel486 processors except the 
Intel486 SX processor in PGA packages. 


9.2.20.1 Test Clock (TCK) 


TCK is an input to the Intel486 processor and provides the clocking function required by the 
JTAG boundary scan feature. TCK is used to clock state information and data into and out of the 
component. State select information and data are clocked into the component on the rising edge 
of TCK on TMS and TDI, respectively. Data is clocked out of the part on the falling edge of TCK 
on TDO. 


In addition to using TCK as a free running clock, it may be stopped in a low, O, state, indefinitely 
as described in IEEE 1149.1. While TCK is stopped in the low state, the boundary scan latches 
retain their state. 


When boundary scan is not used, TCK should be tied high or left as a NC. (This is important dur- 
ing power up to avoid the possibility of glitches on the TCK which could prematurely initiate 
boundary scan operations.) TCK is supplied with an internal pull-up resistor. 


TCK is a clock signal and is used as a reference for sampling other JTAG signals. On the rising 
edge of TCK, TMS and TDI are sampled. On the falling edge of TCK, TDO is driven. 


9.2.20.2 Test Mode Select (TMS) 


TMS is decoded by the JTAG TAP (Tap Access Port) to select the operation of the test logic, as 
described in Section B.5.4, “Test Access Port (TAP) Controller.” 


To guarantee deterministic behavior of the TAP controller, TMS is provided with an internal pull- 
up resistor. If boundary scan is not used, TMS may be tied high or left unconnected. TMS is sam- 
pled on the rising edge of TCK. TMS is used to select the internal TAP states required to load 
boundary scan instructions to data on TDI. For proper initialization of the JTAG logic, TMS 
should be driven high, “1,” for at least four TCK cycles following the rising edge of RESET. 


9.2.20.3 Test Data Input (TDI) 


TDI is the serial input used to shift JTAG instructions and data into the component. The shifting 
of instructions and data occurs during the SHIFT-IR and SHIFT-DR TAP controller states, re- 
spectively. These states are selected using the TMS signal, as described in “Test Access Port 
(TAP) Controller” in Appendix B. 


An internal pull-up resistor is provided on TDI to ensure a known logic state if an open circuit 
occurs on the TDI path. Note that when “1” is continuously shifted into the instruction register, 
the BYPASS instruction is selected. TDI is sampled on the rising edge of TCK, during the 
SHIFT-IR and the SHIFT-DR states. During all other TAP controller states, TDI is a “don't care.” 
TDI is sampled only when TMS and TCK have been used to select the SHIFT-IR or SHIFT-DR 
states in the TAP controller. For proper initialization of JTAG logic, TDI should be driven high 
for at least four TCK cycles following the rising edge of RESET. 


* Some steppings do not support boundary scan. Check with your Intel representative for stepping data and errata. 
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9.2.20.4 Test Data Output (TDO) 


TDO is the serial output used to shift JTAG instructions and data out of the component. The shift- 
ing of instructions and data occurs during the SHIFT-IR and SHIFT-DR TAP controller states, 
respectively. These states are selected using the TMS signal, as described in “Test Access Port 
(TAP) Controller” in Appendix B. When not in SHIFT-IR or SHIFT-DR states, TDO is driven 
to a high impedance state to allow connecting TDO to different devices in parallel. TDO is driven 
on the falling edge of TCK during the SHIFT-IR and SHIFT-DR TAP controller states. At all oth- 
er times TDO is driven to the high impedance state. TDO is only driven when TMS and TCK 
have been used to select the SHIFT-IR or SHIFT-DR states in the TAP controller. 


9.3. INTERRUPT AND NON-MASKABLE INTERRUPT INTERFACE 


The Intel486 processor provides four asynchronous interrupt inputs: INTR (interrupt request), 
NMI (non-maskable interrupt), SMI# (system management interrupt) and STPCLK# (stop clock 
interrupt). This section describes the hardware interface between the instruction execution unit 
and the pins. For a description of the algorithmic response to interrupts, refer to Section 3.15, “In- 
terrupts.” For interrupt timings refer to Section 10.3.10, “Interrupt Acknowledge.” 


9.3.1 Interrupt Logic 


The Intel486 processor contains a two-clock synchronizer on the interrupt line. An interrupt re- 
quest reaches the internal instruction execution unit two clocks after the INTR pin is asserted if 
proper setup is provided to the first stage of the synchronizer. 


There is no special logic in the interrupt path other than the synchronizer. The INTR signal is lev- 
el sensitive and must remain active for the instruction execution unit to recognize it. The interrupt 
is not serviced by the Intel486 processor if the INTR signal does not remain active. 


The instruction execution unit looks at the state of the synchronized interrupt signal at specific 
clocks during the execution of instructions (if interrupts are enabled). These specific clocks are 
at instruction boundaries, or iteration boundaries in the case of string move instructions. Inter- 
rupts are accepted at these boundaries only. 


An interrupt must be presented to the Intel486 processor INTR pin three clocks before the end of 
an instruction for the interrupt to be acknowledged. Presenting the interrupt three clocks before 
the end of an instruction allows the interrupt to pass through the two-clock synchronizer, leaving 
one clock to prevent the initiation of the next sequential instruction and begin interrupt service. 
If the interrupt is not received in time to prevent the next instruction, it will be accepted at the end 
of the next instruction, assuming INTR is still held active. 


The longest latency between when an interrupt request is presented on the INTR pin and when 
the interrupt service begins is determined as follows: 


longest instruction used + the two clocks for synchronization + one clock required to 
vector into the interrupt service microcode. 
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9.3.2 NMI Logic 


The NMI pin has a synchronizer much like that used on the INTR line. The NMI logic is other- 
wise different from that of the maskable interrupt. 


NMLis edge triggered, as opposed to the level triggered INTR signal. The rising edge of the NMI 
signal is used to generate the interrupt request. The NMI input need not remain active until the 
interrupt is actually serviced. The NMI pin must remain active only for a single clock if the re- 
quired setup and hold times are met. NMI operates properly if it is held active for an arbitrary 
number of clocks. 


The NMI input must be held inactive for at least four clocks after it is asserted to reset the edge 
triggered logic. A subsequent NMI may not be generated if the NMI is not held inactive for at 
least four clocks after being asserted. 


The NMI input is internally masked when the NMI routine is entered. The NMI input remains 
masked until an IRET (return from interrupt) instruction is executed. Masking the NMI signal 
prevents recursive NMI calls. If another NMI occurs while the NMI is masked off, the pending 
NMI is executed after the current NMI is done. Only one NMI can be pending while NMI is 
masked. 


9.3.3 SMI# Logic 


SMI# is edge triggered like NMI, but the interrupt request is generated on the falling-edge. SMI# 
is an asynchronous signal, but must meet setup and hold times ty9 and t; in order to guarantee 
recognition on a specific clock. The SMI# input need not remain active until the interrupt is ac- 
tually serviced. The SMI# input only needs to remain active for a single clock if the required setup 
and hold times are met. SMI# also works correctly if it is held active for an arbitrary number of 
clocks. 


The SMI# input must be held inactive for at least four clocks after it is asserted to reset the edge 
triggered logic. A subsequent SMI# might not be recognized if the SMI# input is not held inactive 
for at least four clocks after being asserted. 


SMI#, like NMI, is not affected by the IF bit in the EFLAGS register and is recognized on an 
instruction boundary. An SMI# does not break locked bus cycles. SMI# has a higher priority than 
NMI and is not masked during an NMI. 


After the SMI# interrupt is recognized, the SMI# signal is masked internally until the RSM in- 
struction is executed and the interrupt service routine is complete. Masking the SMI# prevents 
recursive SMI# calls. The SMI# input must be de- asserted for at least four clocks to reset the 
edge triggered logic. If another SMI# occurs while the SMI# is masked, the pending SMI# is rec- 
ognized and executed on the next instruction boundary after the current SMI# completes. This 
instruction boundary occurs before execution of the next instruction in the interrupted application 
code, resulting in back-to-back SMM handlers. Only one SMI# can be pending while SMI# is 
masked. 


The SMI# signal is synchronized internally and should be asserted at least three CLK periods pri- 
or to asserting the RD Y# signal to guarantee recognition on a specific instruction boundary. This 
is important for servicing an I/O trap with an SMI# handler. 
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9.3.4 STPCLK# Logic 


STPCLK# is level triggered and active low. STPCLK# is an asynchronous signal, but must re- 
main active until the processor issues the Stop Grant bus cycle. STPCLK# may be de-asserted at 
any time after the processor generates the Stop Grant bus cycle. When the processor enters the 
Stop Grant state, the internal pull-up resistor of STPCLK#, CLKMUL (for IntelDX4 processor), 
and RESERVED# are disabled to reduce processor power consumption. The STPCLK# input 
must be driven high (not floated) in order to exit the Stop Grant state. After RDY# or BRDY# is 
returned active for the Stop Grant bus cycle, STPCLK# must be de-asserted for a minimum of 
five clocks before being asserted again. 


When the processor recognizes a STPCLK# interrupt, the processor stops execution on the next 
instruction boundary (unless superseded by a higher priority interrupt) stops the prefetch unit, 
empties all internal pipelines and the write buffers, generates a Stop Grant bus cycle, and stops 
the internal clock. At this point, the processor is in the Stop Grant state. 


The processor cannot respond to a STPCLK# request from an HLDA state because it cannot emp- 
ty the write buffers and, therefore, cannot generate a Stop Grant cycle. 


The rising edge of STPCLK# tells the processor that it can return to program execution at the in- 
struction following the interrupted instruction. 


Unlike the normal interrupts, INTR and NMI, the STPCLK# interrupt does not initiate acknowl- 
edge cycles or interrupt table reads. The STPCLK# order of priority among external interrupts is 
shown in Section 3.15.6, “Interrupt and Exception Priorities.”. 


9.4 WRITE BUFFERS 


The Intel486 processor contains four write buffers to enhance the performance of consecutive 
writes to memory. The buffers can be filled at a rate of one write per clock until all buffers are 
filled. 


When all four buffers are empty and the bus is idle, a write request propagates directly to the ex- 
ternal bus, bypassing the write buffers. If the bus is not available at the time the write is generated 
internally, the write is placed in the write buffers and propagates to the bus as soon as the bus 
becomes available. The write is stored in the on-chip cache immediately if the write is a cache hit. 


Writes are driven onto the external bus in the same order in which they are received by the write 
buffers. Under certain conditions, a memory read can go onto the external bus before the memory 
writes pending in the buffer, even though the writes occurred earlier in the program execution. 


A memory read is reordered in front of all writes in the buffers only under the following condi- 
tions: If all writes pending in the buffers are cache hits and the read is a cache miss. Under these 
conditions, the Intel486 processor does not read from an external memory location that needs to 
be updated by one of the pending writes. 
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Reordering of a read with the writes pending in the buffers can only occur once before all the 
buffers are emptied. Reordering read once maintains cache consistency. Consider the following 
example: The processor writes to location X. Location X is in the internal cache, so it is updated 
there immediately. However, the bus is busy, so the write out to main memory is buffered (see 
Figure 9-5). Under these conditions, any reads to location X are cache hits and the most up-to- 
date data is read. 


Intel486™ Write Main 
Processor Cache Buffer Memory 
Ww 
New X New X | New Data X 
Data X Data X Y | New Data Y 
Z 


Figure 9-5. Reordering of a Reads with Write Buffers 


The next instruction causes a read to location Y. Location Y is not in the cache (a cache miss). 
Because the write in the write buffer is a cache hit, the read is reordered. When location Y is read, 
it is put into the cache. The possibility exists that location Y will replace location X in the cache. 
If this is true, location X would no longer be cached (see Figure 9-6). 


Intel486™ Write Main 
Processor Cache Buffer Memory 
Ww 
New X 
X| Data Y Xx 
Data X ¥ Bates, 
Z 


Figure 9-6. Reordering of a Reads with Write Buffers 


Cache consistency has been maintained up to this point. If a subsequent read is to location X (now 
a cache miss) and it was reordered in front of the buffered write to location X, stale data would 
be read. This is why only one read is allowed to be reordered. Once a read is reordered, all writes 
in the write buffer are flagged as cache misses to ensure that no more reads are reordered. Because 
one of the conditions to reorder a read is that all writes in the write buffer must be cache hits, no 
further reordering is allowed until all flagged writes propagate to the bus. Similarly, if an invali- 
dation cycle is run, all entries in the write buffer are flagged as cache misses. 


In multiple processor systems and/or systems using DMA techniques such as bus snooping, 
locked semaphores should be used to maintain cache consistency. 
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9.4.1. Write Buffers and I/O Cycles 
Input/Output (I/O) cycles must be handled in a different manner by the write buffers. 


I/O reads are never reordered in front of buffered memory writes. This ensures that the Intel486 
processor updates all memory locations before reading status from an I/O device. 


The Intel486 processor never buffers single I/O writes. When processing an OUT instruction, in- 
ternal execution stops until the I/O write completes on the external bus. This allows time for the 
external system to drive an invalidate into the Intel486 processor or to mask interrupts before the 
processor progresses to the instruction following OUT. REP OUTS instructions are buffered. 


A read cycle must be generated explicitly to a non-cacheable location in memory to guarantee 
that a read bus cycle is performed. This read is not allowed to proceed to the bus until after the 
I/O write has completed because I/O writes are not buffered. The I/O device has time to recover 
to accept another write during the read cycle. 


9.4.2. Write Buffers on Locked Bus Cycles 


Locked bus cycles are used for read-modify-write accesses to memory. During a read-modify- 
write access, a memory base variable is read, modified and then written back to the same memory 
location. It is important that no other bus cycles, generated by other bus masters or by the Intel486 
processor itself, be allowed on the external bus between the read and write portion of the locked 
sequence. 


During a locked read cycle, the Intel486 processor always accesses external memory; it does not 
look for the location in the on-chip cache. For write cycles, data is written to the internal cache 
(if cache hit) and the external memory. All data pending in the Intel486 processor’s write buffers 
is written to memory before a locked cycle is allowed to proceed to the external bus. 


The Intel486 processor asserts LOCK# after the write buffers are emptied during a locked bus 
cycle. With LOCK# asserted, the processor reads the data, operates on the data, and places the 
results in a write buffer. The contents of the write buffer are then written to external memory. 
LOCK# becomes inactive after the write part of the locked cycle. 


9.5 Reset and Initialization 


The Intel486 processor has a built in self test (BIST) that can be run during reset. BIST is invoked 
when the AHOLD pin is asserted for one clock before and de-asserted one clock after RESET is 
de-asserted. RESET must be active for 15 clocks with or without BIST being enabled. To ensure 
proper results, neither FLUSH# nor SRESET can be asserted while BIST is executing. Refer to 
Appendix B, “Testability,” for information on Intel486 processor testability. 


The Intel486 processor registers have the values shown in Table 9-8 after RESET is performed. 
The EAX register contains information on the success or failure of the BIST if the self test is ex- 
ecuted. The DX register always contains a component identifier at the conclusion of RESET. The 
upper byte of DX (DH) contains 04 and the lower byte (DL) contains the revision identifier (see 
Table 9-9). 


RESET forces the Intel486 processor to terminate all execution and local bus activity. No instruc- 
tion or bus activity occurs as long as RESET is active. 
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All entries in the cache are invalidated by RESET. 


9.5.1. Floating-Point Register Values 


In addition to the register values listed above, IntelDX2 and IntelDX4 processors have the float- 
ing-point register values shown in Table 9-10. 


If the BIST is performed, the floating-point registers are initialized as if the FINIT/FNINIT (ini- 
tialize processor) instruction were executed. If the BIST is not executed, the floating-point regis- 
ters are unchanged. 


The Intel486 processor starts executing instructions at location FFFFFFFOH after RESET. When 
the first Inter Segment Jump or Call is executed, address lines A[31:20] drop low for CS-relative 
memory cycles, and the Intel486 processor executes instructions only in the lower 1 Mbyte of 
physical memory. This allows the system designer to use ROM at the top of physical memory to 
initialize the system and take care of RESETs. 


Table 9-8. Register Values after Reset 


Register ee vec 
EAX Zero (Pass) Undefined 
ECX Undefined Undefined 
EDX 0400 + 0400 + 

Revision ID Revision ID 
EBX Undefined Undefined 
ESP Undefined Undefined 
EBP Undefined Undefined 
ESI Undefined Undefined 
EDI Undefined Undefined 
EFLAGS 00000002h 00000002h 
EIP OFFFOh OFFFOh 
ES 0000h 0000h 
cs FOOOh FOOOh 
Ss 0000h 0000h 
DS 0000h 0000h 
FS 0000h 0000h 
GS 0000h 0000h 
IDTR Base = 0, Base = 0, 
Limit = 3FFh Limit = 3FFh 
CRO 60000010h 60000010h 
DR7 00000000h 00000000h 
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Table 9-9. Floating-Point Values after Reset 


Register een Thelen 
CW 037Fh Unchanged 
SW 0000h Unchanged 
TW FFFFh Unchanged 
FIP 00000000h Unchanged 
FEA 00000000h Unchanged 
FCS 0000h Unchanged 
FDS 0000h Unchanged 
FOP 000h Unchanged 
FSTACK Undefined Unchanged 


9.5.2 Pin State During Reset 


The Intel486 processor recognizes and can respond to HOLD, AHOLD, and BOFF# requests re- 
gardless of the state of RESET. Thus, even though the processor is in reset, it can float its bus in 
response to any of these requests. 


While in reset, the Intel486 processor bus is in the state shown in Figure 9-7 if the HOLD, 
AHOLD and BOFF# requests are inactive. Note that the address (A[31:2], BE[3:0]#) and cycle 
definition (M/IO#, D/C#, W/R#) pins are undefined from the time reset is asserted until the start 
of the first bus cycle. All undefined pins (except FERR#) assume known values at the beginning 
of the first bus cycle. The first bus cycle is always a code fetch to address FRFFFFFFOH. 
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See notes on next page. 


Figure 9-7. Pin States During RESET 
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Notes to Figure 9-7: 

1. RESET is an asynchronous input. tpg must be met only to guarantee recognition on a specific clock 
edge. 

2. When A20M¢ is driven synchronously, it must be driven high (inactive) for the CLK edge prior to the 
falling edge of RESET to ensure proper operation. A2Z0M# setup and hold times must be met. 

3. When A20M# is driven asynchronously, it should be driven low (active) for two CLKs prior to and two 
CLKs after the falling edge of RESET to ensure proper operation. 

4. When FLUSH# is driven synchronously, it must be driven low (high) for the CLK edge prior to the 
falling edge of RESET to invoke the three-state Output Test Mode. All outputs are guaranteed three- 
stated within 10 CLKs of RESET being de-asserted. FLUSH# setup and hold times must be met. 

5. When FLUSH# is driven asynchronously, it must be driven low (active) for two CLKs prior to and two 
CLKs after the falling edge of RESET to invoke the three-state Output Test Mode. All outputs are 
guaranteed three-stated within 10 CLKs of RESET being de-asserted. 

6. AHOLD should be driven high (active) for the CLK edge prior to the falling edge of RESET to invoke 
the Built-in Self Test (BIST). AHOLD setup and hold times must be met. 

7. Hold is recognized normally during RESET. On power-up, HLDA is indeterminate until RESET is 
recognized by the processor. 

8. 15 CLKs RESET pulse width for warm resets. Power-up resets require RESET to be asserted for at 
least 1 ms after Vcc and CLK are stable. 

9. WB/WT# should be driven high for at least one CLK before the falling edge of RESET and at least one 
CLK after the falling edge of RESET to enable the Enhanced Bus mode. Standard Bus mode is 
enabled if WB/WT# is sampled low or left floating at the falling edge of RESET. 

10. The system may sample HITM# to detect the presence of the Enhanced Bus mode. If HITM# is high 
for one CLK after RESET is inactive, Enhanced Bus mode is present. 


9.5.2.1 Controlling the CLK Signal in the Processor during Power On 


Intel does not specify the power on requirements of the Intel486 processor allowable CLK input 
during the power on sequence. Clocking the processor before Vcc reaches its normal operating 
level can cause unpredictable results on Intel486 processors. The information in this section re- 
flects what Intel considers a good clock design. 


Intel strongly recommends that system designers ensure that a clock signal is not presented to the 
Intel486 processor until Vcc has stabilized at its normal operating level. This design recommen- 
dation can easily be met by gating the clock signal with a POWERGOOD signal. The POWER- 
GOOD signal should reflect the status of Vcc at the Intel486 processor (which may be different 
from the power supply status in designs that provide power to the processor using a voltage reg- 
ulator or converter). 


Most clock synthesizers and some clock oscillators contain on-board gating logic. If external gat- 
ing logic is implemented, it should be done on the original clock signal output from the clock os- 
cillator/synthesizer. Gating the clock to the processor independently of the clock to the rest of the 
motherboard causes clock skew, which may violate processor or chipset timing requirements. If 
the clock signal to the motherboard is enabled with aPOWERGOOD signal, verify that the moth- 
erboard logic does not require a clock input prior to this POWERGOOD signal. Some chip sets 
also gate the clock to the processor only after a POWERGOOD signal, which inherently meets 
the requirements of this design. Designs should implement the design as described in this section 
to maintain maximum flexibility with all Intel486 processor steppings. 
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9.5.2.2 FERR# Pin State During Reset for IntelDX2™ and IntelDX4™ Processors 


FERR# reflects the state of the ES (error summary status) bit in the floating-point unit status 
word. The ES bit is initialized when the floating-point unit state is initialized. The floating-point 
unit’s status word register can be initialized by BIST or by executing the FINIT/FNINIT instruc- 
tion. Thus, after reset and before executing the first FINIT or FNINIT instruction, the values of 
the FERR# and the numeric status word register bits 7:0 depend on whether or not BIST is per- 
formed. Table 9-10 shows the state of FERR# signal after reset and before the execution of the 
FINIT/FNINIT instruction. 


Table 9-10. FERR# Pin State after Reset and before FP Instructions 


BIST Performed FERR# Pin FPU Status Word Register Bits 7:0 
YES Inactive (High) Inactive (Low) 
NO Undefined (Low or High) Undefined (Low or High) 


After the first FINIT or FNINIT instruction, FERR# and the FPU status word register bits (7:0) 
are inactive, irrespective of the Built-In Self-Test (BIST). 


9.5.2.3 Power Down Mode (In-circuit Emulator Support) 


The Power Down mode on the Intel486 processor, when initiated by the Reserved# signal, reduc- 
es the power consumption of the Intel486 processor, as well as forces all of its output signals to 
be three-stated. The RERSER VED# pin on the Intel486 processor is used for enabling the Power 
Down mode. 


When the RESERVED# pin is driven active upon power-up, the Intel486 processor’s bus is float- 
ed immediately. The Intel486 processor enters Power Down mode when the RESERVED# pin is 
sampled asserted in the clock before the falling edge of RESET. The RESERVED# pin has no 
effect on the power down status, except during this edge. The Intel486 processor then remains in 
the Power Down mode until the next time the RESET signal is activated. For warm resets, with 
the upgrade processor in the system, the Intel486 processor remains three-stated and re-enters the 
Power Down mode once RESET is de-asserted. Similarly for power-up resets, if the upgrade pro- 
cessor is not taken out of the system, the Intel486 processor three-states its outputs upon sensing 
the RESERVED# pin active and enters the Power Down Mode after the falling edge of RESET. 


9.6 CLOCK CONTROL 


The Intel486 processor provides an interrupt mechanism (STPCLK#) that allows system hard- 
ware to control the power consumption of the processor by stopping the internal clock (output of 
the PLL) to the processor core in a controlled manner. This low-power state is called the Stop 
Grant state. In addition, the STPCLK# interrupt allows the system to change the input frequency 
within the specified range or completely stop the CLK input frequency (an input to the PLL). If 
the CLK input is stopped completely, the processor enters into the Stop Clock state—the lowest 
power state. 


See Section 9.6.4.2 and Section 9.6.4.3, for a detailed description of the Stop Grant and Stop 
Clock states, respectively. 
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9.6.1 Stop Grant Bus Cycles 


A special Stop Grant bus cycle is driven to the bus after the processor recognizes the STPCLK# 
interrupt. The definition of this bus cycle is the same as the HALT cycle definition for the stan- 
dard Intel486 processor, with the exception that the Stop Grant bus cycle drives the value 0000 
0010H on the address pins. The system hardware must acknowledge this cycle by returning 
RDY# or BRDY#. The processor does not enter the Stop Grant state until either RDY# or 
BRDY# has been returned. 


The Stop Grant bus cycle is defined as follows: 


M/IO# = 0, D/C# = 0, W/R# = 1, address bus = 0000 0010H (A, = 1), BE3:0# = 1011, data bus 
= undefined 


The latency between a STPCLK# request and the Stop Grant bus cycle depends on the current 
instruction, the amount of data in the processor write buffers, and the system memory perfor- 
mance (see Figure 9-8). 


I I I l I I I 
CLK [ 


Figure 9-8. Stop Clock Protocol 


9.6.2 Pin State During Stop Grant 


During the Stop Grant state, most output and input/output signals of the processor maintain their 
previous condition (the level they held when entering the Stop Grant state). The data and data par- 
ity signals are three-stated. In response to HOLD being driven active during the Stop Grant state 
(when the CLK input is running), the processor generates HLDA and three-states all output and 
input/output signals that are three-stated during the HOLD/HLDA state. After HOLD is de-as- 
serted, all signals return to their prior state before the HOLD/HLDA sequence. 


In order to achieve the lowest possible power consumption during the Stop Grant state, the system 
designer must ensure that the input signals with pull-up resistors are not driven low and the input 
signals with pull-down resistors are not driven high. (Refer to the Quick Pin Reference in the 
datasheets for signals with internal pull-up and pull-down resistors.) 
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All inputs except the data bus pins must be driven to the power supply rails to ensure the lowest 
possible current consumption during Stop Grant or Stop Clock states. For compatibility with fu- 
ture processors, data pins should be driven low to achieve the lowest possible power consump- 
tion. Pull-down resistors/bus keepers are needed to minimize leakage current. 


If HOLD is asserted during the Stop Grant state, all pins that are normally floated during HLDA 
are still floated by the processor. The floated pins should be driven to a low level (see Table 9-11). 


Table 9-11. Pin State during Stop Grant Bus State 


Signal Type State 
A[3:2] oO Previous state 
A[31:4] VO Previous state 
D[31:0] VO Floated 
BE[3:0]# oO Previous state 
DP[3:0] VO Floated 
W/R#, D/C#, M/1O# oO Previous state 
ADS# oO Inactive 
LOCK#, PLOCK# O Inactive 
BREQ O Previous state 
HLDA O As per HOLD 
BLAST# oO Previous state 
FERR# oO Previous state 
PCD, PWT oO Previous state 
PCHK# oO Previous state 
PWT, PCD O Previous state 
SMIACT# oO Previous state 


9.6.3 Write-Back Enhanced IntelDX4™ Processor Pin States During Stop Grant 
State 


During the Stop Grant state, most output signals of the processor maintain their previous condi- 
tion, which is the level they held when entering the Stop Grant state. The data bus and data parity 
signals also maintain their previous state. In response to HOLD being driven active during the 
Stop Grant state when the CLK input is running, the Write-Back Enhanced IntelDX4 processor 
generates HLDA and three-states all output and input/output signals that are three-stated during 
the HOLD/HLDA state. After HOLD is de-asserted, all signals return to the state they were in 
prior to the HOLD/HLDA sequence. 
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All inputs should be driven to the power supply rails to ensure the lowest possible current con- 
sumption during the Stop Grant or Stop Clock states (see Table 9-12). 


Table 9-12. Write-Back Enhanced IntelDX4™ Processor Pin States during Stop Grant Bus 


Cycle 
Signal Type State 

A[3:2] oO Previous state 
A[31:4] VO Previous state 
D[31:0] VO Previous state 
BE[3:0]# oO Previous state 
DP[3:0] VO Previous state 
W/R#, D/C#, M/1O# O Previous state 
ADS# O Inactive (high) 
LOCK#, PLOCK# O Inactive (high) 
BREQ O Previous state 

HLDA O As per HOLD 
BLAST# oO Previous state 
FERR# O Previous state 
PCHK# oO Previous state 
PWT, PCD oO Previous state 
CACHE# O Inactive) (high) 
HITM# O Inactive) (high) 
SMIACT# oO Previous state 

NOTES: 
1. For the case of snoop cycles (via EADS#) during Stop Grant state, both HITM# and 
CACHE# may go active depending on the snoop hit in the internal cache. 


2. During Stop Grant state, AHOLD, HOLD, BOFF# and EADS# are serviced normally. 


The Write-Back Enhanced IntelDX4 processor has bus keepers features. The data bus and data 
parity pins have bus keepers that maintain the previous state while in the Stop Grant state. Exter- 
nal resistors are no longer required, which prevents excess current during the Stop Grant state. (If 
external resistors are present, they should be strong enough to “flip” the bus hold circuitry and 
eliminate potential DC paths. Alternately, “weak” resistors may be added to prevent excessive 
current flow.) 


In order to obtain the lowest possible power consumption during the Stop Grant state, system de- 
signers must ensure that the input signals with pull-up resistors are not driven low, and the input 
signals with pull-down resistors are not driven high. 
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9.6.4 Clock Control State Diagram 


The following state descriptions and diagram show the state transitions during a Stop Clock cycle 
for the Intel486 processor. (Refer to Figure 9-9 for a Stop Clock state diagram.) Refer to Section 
9.6.5 for Write-Back Enhanced Intel486 processors Clock Control State specifics. 


9.6.4.1 Normal State 


This is the normal operating state of the processor. 


9.6.4.2 Stop Grant State 


The Stop Grant state provides a fast wake-up state that can be entered by simply asserting the ex- 
ternal STPCLK# interrupt pin. Once the Stop Grant bus cycle has been placed on the bus, and 
either RDY# or BRD Y# is returned, the processor is in this state (depending on the CLK input 
frequency). The processor returns to the normal execution state approximately 10-20 clock peri- 
ods after STPCLK# has been de-asserted. 


While in the Stop Grant state, the pull-up resistors on STPCLK#, CLKMUL (for the IntelDX4 
processor) and RESERVED# are disabled internally. The system must continue to drive these in- 
puts to the state they were in immediately before the processor entered the Stop Grant state. For 
minimum processor power consumption, all other input pins should be driven to their inactive 
level while the processor is in the Stop Grant state. 


A RESET or SRESET brings the processor from the Stop Grant state to the Normal state. The 
processor recognizes the inputs required for cache invalidations (HOLD, AHOLD, BOFF# and 
EADS#), as explained later in this section. The processor does not recognize any other inputs 
while in the Stop Grant state. Input signals to the processor are not recognized until one CLK after 
STPCLK# is de-asserted (see Figure 9-10). 


While in the Stop Grant state, the processor does not recognize transitions on the interrupt signals 
(SMI#, NMI, and INTR). Driving an active edge on either SMI# or NMI does not guarantee rec- 
ognition and service of the interrupt request following exit from the Stop Grant state. However, 
if one of the interrupt signals (SMI#, NMI, or INTR) is driven active while the processor is in the 
Stop Grant state, and held active for at least one CLK after STPCLK# is de-asserted, the corre- 
sponding interrupt is serviced. The Intel486 processor requires INTR to be held active until the 
processor issues an interrupt acknowledge cycle in order to guarantee recognition (see Figure 
9-10). 


When the processor is in the Stop Grant state, the system can stop or change the CLK input. When 
the CLK input to the processor is stopped or changed, the Intel486 processor requires the CLK 
input to be held at a constant frequency for a minimum of | ms before de-asserting STPCLK#. 
This 1-ms time period is necessary so that the PLL can stabilize, and it must be met before the 
processor returns to the Stop Grant state. 


NOTE 


On the Ultra-Low Power Intel486 family of processors, the 1 ms settling 
period is not necessary due to the digital delay line based clock circuit. 


| 9-37 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


HALT asserted and 


4. Auto HALT Power Down State HALT Bus cycle 1. Normal State 
enerated Reset 
CLK Running 2 ‘i fe ; q— 
log ~ 100 mA ormal Execution 
INTR, NMI, SMI#, 
RESET, SRESET 
STPCLK# asserted and 
STPCLK# de-asserted and Stop Grant Bus cycle 
HALT Bus cycle generated generated 
EADS# 
STPCLK# asserted and 
Stop Grant Bus cycle generated 
5. Stop Clock Snoop State EADS# 2. Stop Grant State 
One Clock Powerup Clock Running 
Perform Cache Invalidation log -— 20 MA— 50 mA 


START CLK 
Stop CLK + PLL Startup LATENCY 


3. Stop Clock State 


Internal Powerdown 


CLK Changed* 
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range or completely stop the CLK input frequency (input to PLL). 


Figure 9-9. Intel486™ Processor Family Stop Clock State Machine 


The processor generates a Stop Grant bus cycle only when entering that state from the Normal or 
the Auto HALT Power Down state. When the processor enters the Stop Grant state from the Stop 
Clock state or the Stop Clock Snoop state, the processor does not generate a Stop Grant bus cycle. 


9.6.4.3 Stop Clock State 


Stop Clock state is entered from the Stop Grant state by stopping the CLK input (either logic high 
or logic low). None of the processor input signals should change state while the CLK input is 
stopped. Any transition on an input signal (with the exception of INTR, NMI and SMI#) before 
the processor has returned to the Stop Grant state results in unpredictable behavior. If INTR is 
driven active while the CLK input is stopped, and held active until the processor issues an inter- 
rupt acknowledge bus cycle, it is serviced in the normal manner. The system design must ensure 
that the processor is in the correct state prior to asserting cache invalidation or interrupt signals 
to the processor. 
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A: Earliest time at which NMI or SMI# is recognized. 


Figure 9-10. Recognition of Inputs when Exiting Stop Grant State 


The processor returns to the Stop Grant state after the CLK input has been running at a constant 
frequency for a period of time equal to the PLL startup latency (see Section 9.6.4.2). The CLK 
input can be restarted to any frequency between the minimum and maximum frequency listed in 
the AC timing specifications. 


9.6.4.4 Auto HALT Power Down State 


The execution of a HALT instruction also causes the processor to automatically enter the Auto 
HALT Power Down state. The processor issues a normal HALT bus cycle before entering this 
state. The processor transitions to the Normal state on the occurrence of INTR, NMI, SMI#, RE- 
SET, or SRESET. 


The system can generate a STPCLK# while the processor is in the Auto HALT Power Down 
state. The processor generates a Stop Grant bus cycle when it enters the Stop Grant state from the 
HALT state. 


When the system de-asserts the STPCLK# interrupt, the processor returns execution to the HALT 
state. The processor generates a new HALT bus cycle when it re-enters the HALT state from the 
Stop Grant state. 
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9.6.4.5 Stop Clock Snoop State (Cache Invalidations) 


When the processor is in the Stop Grant state or the Auto HALT Power Down state, the processor 
recognizes HOLD, AHOLD, BOFF# and EADS# for cache invalidation. When the system asserts 
HOLD, AHOLD, or BOFF?#, the processor floats the bus accordingly. When the system then as- 
serts EADS#, the processor transparently enters the Stop Clock Snoop state and powers up for 
one full core clock in order to perform the required cache snoop cycle. It then re-freezes the clock 
to the processor core and returns to the previous state. The processor does not generate a bus cycle 
when it returns to the previous state. 


A FLUSH# event during the Stop Grant state or the Auto HALT Power Down state is latched and 
acted upon by asserting the internal FLUSH# signal for one clock upon re-entering the Normal 
state. 


9.6.4.6 Auto Idle Power Down State 


When the processor is known to be truly idle and waiting for RDY# or BRDY# from a memory 
or I/O bus cycle read, the Intel486 processor reduces its core clock rate to equal that of the exter- 
nal CLK frequency without affecting performance. When RDY# or BRDY# is asserted, the pro- 
cessor returns to clocking the core at the specified multiplier of the external CLK frequency. This 
functionality is transparent to software and external hardware. 


9.6.5 Write-Back Enhanced IntelDX4™ Processor Clock Control State Diagram 


Figure 9-11 shows the state transitions during Stop Clock for the Write-Back Enhanced IntelDX4 
processor. 


NOTE 
The Stop Clock State Machine in the Standard bus configuration is identical to 
that of other Intel486 processors. (See Section 9.6.4, “Clock Control State 
Diagram” on page 9-37.) 
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9.6.5.1 Normal State 


This is the normal operating state of the processor. When the processor is executing program/in- 
struction and the STPCLK# pin is not asserted, the processor is said to be in its Normal state. 


4. Auto HALT Power Down State 1. Normal State 
CLK Running HALT Reset 
Halt Bus Cycle Generated Normal Execution 
Icc approximately 100 yA SE EPRENTay EERE <i All Clocks Runnin 
CE SRR 4 INTR, NMI, SMI#, 
RESET, SRESET 
STPCLK# asserted 
STPCLK# de-asserted Paks iae 
EADS# Stop Grant 
bus cycles 
STPCLK# asserted generated 


5. Stop Clock Snoop State 2. Stop Grant State 


Clock Running 
loc Approximately 20 mA — 50 mA 


Clock Powerup 
Write through: Cache Invalidation 
Write back: Write, Invalidation 


START CLK 
Stop CLK + PLL Startup LATENCY 
FLUSH# 

6. Auto HALT Power Down 3. Stop Clock State 

Flush State 
Write through: Cache Invalidation Internal Powerdown 
Write back: Write-back, Invalidation, CLK Stopped 

2 flush acknowledge cycles Ioc Approximately 100 pA 


Figure 9-11. Write-Back Enhanced IntelDX4™ Processor Stop Clock State Machine 
(Enhanced Bus Configuration) 
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9.6.5.2 Stop Grant State 


For minimum processor power consumption, all other input pins should be driven to their inactive 
level while the processor is in the Stop Grant state except for the data bus, data parity, WB/WT# 
and INV pins. WB/WT# should be driven low and INV should be driven high. 


In both the Standard mode and Enhanced mode, the following conditions exist: 
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A RESET, SRESET or de-assertion of STPCLK# brings the processor from the Stop Grant 
state to the Normal state. 


While in the Stop Grant state, the processor does not recognize transitions on the interrupt 
signals (SMI#, NMI, and INTR). This means SMI#, NMI, and INTR are not Stop Break 
events. The external logic should de-assert STPCLK# before issuing interrupts, or if an 
interrupt is asserted it should be kept asserted for at least one clock after STPCLK# is 
removed. (Note that the Write-Back Enhanced IntelDX4 processor requires that INTR be 
held active until the processor issues an interrupt acknowledge cycle in order to guarantee 
recognition). 


FLUSH# is not a Stop Break event. But if FLUSH# is asserted during the Stop Grant state, 
it is latched by the Write-Back Enhanced IntelDX4 processor and serviced later when 
STPCLK# is de-asserted. 


The processor latches and responds to the inputs BOFF#, EADS#, AHOLD, and HOLD. 
The processor does not recognize any other inputs while in the Stop Grant state except 
FLUSH#. Other input signals to the processor are not recognized until the CLK following 
the CLK in which STPCLK# is de-asserted (see Figure 9-11). 


The processor generates a Stop Grant bus cycle only when entering that state from the 
Normal or the Auto HALT Power Down state. The Stop Grant bus cycle is not generated 
when the processor enters the Stop Grant state from the Stop Clock state or the Stop Clock 
Snoop state. 


The processor does not enter the Stop Grant state until all the pending writes are completed, 
all pending interrupts are serviced, and the processor is idle. 
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9.6.5.3 Stop Clock State 


The Stop Clock state is the lowest power consumption mode of the Intel486 processor, because 
it allows removal of the external clock. It also has the longest latency for returning to normal 
state. The Stop Clock state is entered from the Stop Grant state by stopping the CLK input. In the 
Stop Clock state, total processor power consumption drops to 100 A, which is approximately 
200-250 times lower than the Stop Grant state. None of the processor input signals should change 
state while the CLK input is stopped. Any transition on an input signal before the processor has 
returned to the Stop Grant state results in unpredictable behavior. If INTR is driven active, it must 
remain active until the processor issues an interrupt acknowledge cycle. 


In the Stop Clock state, the processor is dormant. It does not respond to transitions on any of the 
input pins, including snoops, flushes and interrupts. It is recommended that this mode only be en- 
tered if the processor cache is coherent with main memory and the processor is not processing 
interrupts. If this mode is entered with a dirty cache, no alternate master cycles can be allowed 
while the processor is in the Stop Clock state. 


The processor returns to the Stop Grant state after the CLK input has been running at a constant 
frequency for a period of time equal to the PLL startup latency. The CLK input can be restarted 
to any frequency between the minimum and maximum frequency listed in the AC timing speci- 
fications. 


In Enhanced Bus mode, if the processor is taken into the Stop Clock state with a dirty cache, al- 
ternate bus master cycles are not allowed while the processor remains in the Stop Clock state. In 
order to take the processor into the Stop Clock state with a clean cache, the cache must be flushed. 
During the time the cache is being flushed, the system must block interrupts to the processor. 
With all interrupts other than STPCLK# blocked, the processor does not write into the cache dur- 
ing the time from the completion of the flush and time it enters the Stop Grant state. This is nec- 
essary for the cache to be coherent. To ensure cache coherency, the system should drive KEN# 
inactive from the time the flush starts until the Stop Grant cycle is issued. The system can then 
put the processor in the Stop Clock state by stopping the clock. 


If the processor is already in the Stop Grant state and entering the Stop Clock state is desired, the 
system must de-assert STPCLK# before flushing the cache in order to ensure cache coherency. 
The five-clock de-assertion specification for STPCLK# must also be met before the above se- 
quence can occur. 
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9.6.5.4 Auto HALT Power Down State 


Upon execution of a HALT instruction, the processor automatically enters a low power state 
called the Auto HALT Power Down state. The processor issues a normal HALT bus cycle when 
entering this state. Because interrupts are HALT break events, the processor transitions to the 
Normal state on the occurrence of INTR, NMI, SMI# or RESET (SRESET is also a HALT break 
event). If a FLUSH# occurs while the processor is in this state, the FLUSH# is serviced by tran- 
sitioning to the Stop Clock Flush state. After the FLUSH# is completed, the processor returns to 
the Auto HALT Power Down state. 


The system can generate a STPCLK# while the processor is in the Auto HALT Power Down 
state. The processor then generates a Stop Grant bus cycle and enters the Stop Grant state from 
the Auto HALT Power Down state. When the system de-asserts the STPCLK# interrupt, the pro- 
cessor returns to the Auto HALT Power Down state. The processor does not generate a new 
HALT bus cycle when it re-enters the Auto HALT Power Down state from the Stop Grant state. 


9.6.6 Stop Clock Snoop State (Cache Invalidations) 


When the processor is in the Stop Grant state or the Auto HALT Power Down state, the processor 
recognizes HOLD, AHOLD, BOFF#, and EADS# for cache invalidation. When the system as- 
serts HOLD, AHOLD, or BOFF?#, the processor floats the bus accordingly. When the system as- 
serts EADS#, the processor transparently enters the Stop Clock Snoop state and powers up in 
order to perform the required cache snoop cycle and write-back cycles. It then refreezes the CLK 
to the processor core and returns to the previous state (i.e., either the Stop Grant state or the Auto 
HALT Power Down state). The processor does not generate a bus cycle when it returns to the 
previous state. 


9.6.6.1 Auto HALT Power Down Flush State (Cache Flush) for the Write-Back 
Enhanced IntelDX4™ Processor 


When the Write-Back Enhanced IntelD X4 processor is in either Standard or Enhanced Bus mode, 
and a FLUSH# event occurs during Auto HALT Power Down state, the processor transitions to 
the Auto HALT Power Down Flush state. If the on-chip cache is configured as a write-back 
cache, the CLK to the processor core is turned on until all the dirty lines are written back, the 
cache is invalidated, and the two flush acknowledge cycles are completed. If the on-chip cache is 
configured as a write-through cache, the CLK to the processor core is turned on until the cache 
is invalidated. The processor then refreezes the CLK and returns to the previous state (i.e., the 
Auto HALT Power Down state). Auto HALT Power Down Flush state is entered only from the 
Auto HALT Power Down state and not from the Stop Grant state. 
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9.6.7. Supply Current Model for Stop Clock Modes and Transitions 


Figures 9-12 and 9-13 illustrate the effect of different Stop Clock state transitions on the supply 
current of the Intel486 processor. 


A. STPCLK# asserted, CLK not changed 


A’ STPCLK# de-asserted, CLK not changed 
_ Short (10CLK) latency. 


B_ CLK started, required PLL startup latency © Noi Sate 


to re-enter Stop Grant state. m Stop Grant State 
31 


© Stop Clock State 
C Stop CLK (from Stop Grant to Stop Clock) 
- 


,C CLK started, requires PLL startup latency to 


I 33/66 
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Figure 9-12. Supply Current Model for Stop Clock Modes and Transitions for the 
Intel486™ Processor 
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A. STPCLK# asserted, CLK not changed 


@ Normal State 
@ Stop Grant State 
© Stop Clock State 


A’ STPCLK# de-asserted, CLK not changed 
Short (10CLK) latency. 


B_ CLK started, required PLL startup latency 
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Figure 9-13. Supply Current Model for Stop Clock Modes and Transitions for the 
IntelDX4™ Processor 
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A. STPCLK# asserted, CLK not changed 


@ Normal State 
@ Stop Grant State 
© Stop Clock State 


A’ STPCLK# de-asserted, CLK not changed 
_ Short (8 CLK) latency. 


B CLK started, required DDL 8 CLK startup 
latency to re-enter Stop Grant state. 


C Stop CLK (from Stop Grant to Stop Clock) 
a 


C’ CLK started, requires DDL 8 CLK startup 
latency to re-enter Stop Grant state. 


25/50 
e 


8 MHz 25 MHz 33 MHz 


Figure 9-14. Supply Current Model for Stop Clock Modes and Transitions for the Ultra-Low 


Power [ntel486™ SX and Ultra-Low Power Intel486 GX Processors 


9-47 


intel. 


Bus Operation 


Chapter Contents 

10.1 Data Transfer Mechanism. ...........eecesceceeeceseeceeeeeereeeneeeeneees 10-1 
10.2 Bus Arbitration Logic oo. ceecseeseecreeteeeseeeseeerens 10-13 
10.3. Bus Functional Description... eee eee cseeeeeeeeeeeeees 10-16 
10.4 Enhanced Bus Mode Operation (Write-Back Mode) 


for the Write-Back Enhanced IntelIDX4™ Processor 


intel. 


CHAPTER 10 
BUS OPERATION 


All Intel486™ processors operate in Standard Bus (write-through) mode. However, when the in- 
ternal cache of the Write-Back Enhanced IntelDX4™ processor can be configured in write-back 
mode, the processor bus then operates in the Enhanced Bus mode, which is described in Section 
10.4. When the internal cache of the Write-Back Enhanced IntelDX4 processor is configured in 
write-through mode, the processor bus operates in Standard Bus mode, identical to the other em- 
bedded Intel486 processors. 


10.1 DATA TRANSFER MECHANISM 


All data transfers occur as a result of one or more bus cycles. Logical data operands of byte, word 
and doubleword lengths may be transferred without restrictions on physical address alignment. 
Data may be accessed at any byte boundary but two or three cycles may be required for unaligned 
data transfers. (See Section 10.1.2, “Dynamic Data Bus Sizing,” and Section 10.1.5, “Operand 
Alignment.”’) 


The Intel486 processor address signals are split into two components. High-order address bits are 
provided by the address lines, A[31:2]. The byte enables, BE[3:0]#, form the low-order address 
and provide linear selects for the four bytes of the 32-bit address bus. 


The byte enable outputs are asserted when their associated data bus bytes are involved with the 
present bus cycle, as listed in Table 10-1. Byte enable patterns that have a deasserted byte enable 
separating two or three asserted byte enables never occur (see Table 10-5 on page 10-8). All other 
byte enable patterns are possible. 


Table 10-1. Byte Enables and Associated Data and Operand Bytes 


Byte Enable Signal Associated Data Bus Signals 
BEO# D[7:0] (byte 0-least significant) 
BE1# D[15:8] (byte 1) 
BE2# D[23:16] (byte 2) 
BE3# D[31:24] (byte 3—most significant) 


Address bits AO and Al of the physical operand's base address can be created when necessary. 
Use of the byte enables to create AO and A1 is shown in Table 10-2. The byte enables can also 
be decoded to generate BLE# (byte low enable) and BHE# (byte high enable). These signals are 
needed to address 16-bit memory systems. (See Section 10.1.3, “Interfacing with 8-, 16-, and 32- 
Bit Memories.’’) 
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10.1.1 Memory and I/O Spaces 


Bus cycles may access physical memory space or I/O space. Peripheral devices in the system can 
be either memory-mapped, I/O-mapped, or both. Physical memory addresses range from 
00000000H to FFFFFFFFH (4 gigabytes). I/O addresses range from O0000000H to OOOOFFFFH 
(64 Kbytes) for programmed I/O. (See Figure 10-1.) 


Table 10-2. Generating A[31:0] from BE[3:0]# and A[31:A2] 


Intel486™ Processor Address Signals 


A31 through A2 
BE3# BE2# BE1# BEO# 
Physical Address 
A31 ies A2 Al AO 
A31 nF A2 0 0 X xX X 0 
A31 Ae A2 0 1 X xX 0 1 
A31 ee A2 1 0 X 0 1 1 
A31 af A2 1 1 0 1 1 1 
FFFFFFFFH 
Physical 
Memory 
4 Gbyte 
OOOOFFFFH \ Accessible 
Programmed 
00000000H 00000000H /O Space 
Physical Memory /O Space 
Space 


Figure 10-1. Physical Memory and I/O Spaces 
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10.1.1.1 Memory and I/O Space Organization 


The Intel486 processor datapath to memory and input/output (I/O) spaces can be 8, 16, or 32 bits 
wide. The byte enable signals, BE[3:0]#, allow byte granularity when addressing any memory or 
I/O structure, whether 8, 16, or 32 bits wide. 


The Intel486 processor includes bus control pins, BS 16# and BS8#, which allow direct connec- 
tion to 16- and 8-bit memories and I/O devices. Cycles of 32-, 16- and 8-bits may occur in any 
sequence, since the BS8# and BS16# signals are sampled during each bus cycle. 


NOTE 
The Ultra-Low Power Intel486 GX processor has a 16-bit external data bus. 
All data transfers are done on the low order data bits (D15-D0) and parity is 
generated and checked on pins DPO and DP1. For this reason, dynamic data 
bus sizing (using pins BS 16# and BS8#) is not supported. 


Memory and I/O spaces that are 32-bit wide are organized as arrays of four bytes each. Each four 
bytes consists of four individually addressable bytes at consecutive byte addresses (see 
Figure 10-2). The lowest addressed byte is associated with data signals D[7:0]; the highest-ad- 
dressed byte with D[3 1:24]. Each 4 bytes begin at an address that is divisible by four. 


32-Bit Wide Organization 


FFFFFFFFH FFFFFFFCH 
00000003H 00000000H 
a es es eae) 

BE3# BE2# BE1# BEO# 


16-Bit Wide Organization 
FFFFFFFFH FFFFFFFEH 


00000001H 00000000H 


— —— 
BHE# BLE# 


Figure 10-2. Physical Memory and I/O Space Organization 


16-bit memories are organized as arrays of two bytes each. Each two bytes begins at addresses 
divisible by two. The byte enables BE[3:0]#, must be decoded to Al, BLE# and BHE# to address 
16-bit memories. 


To address 8-bit memories, the two low order address bits AO and Al must be decoded from 
BE[3:0]#. The same logic can be used for 8- and 16-bit memories, because the decoding logic for 
BLE# and AO are the same. (See Section 10.1.3, “Interfacing with 8-, 16-, and 32-Bit Memo- 
ries.” 
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10.1.2 Dynamic Data Bus Sizing 


Dynamic data bus sizing is a feature that allows processor connection to 32-, 16- or 8-bit buses 
for memory or I/O. The Intel486 processors can access all three bus sizes, except for the Ultra- 
Low Power Intel486 GX processor, which has a 16-bit data bus. Transfers to or from 32-, 16- or 
8-bit devices are supported by dynamically determining the bus width during each bus cycle. Ad- 
dress decoding circuitry may assert BS16# for 16-bit devices or BS8# for 8-bit devices during 
each bus cycle. BS8# and BS16# must be deasserted when addressing 32-bit devices. An 8-bit 
bus width is selected if both BS16# and BS8# are asserted. 


BS16# and BS8# force the Intel486 processor to run additional bus cycles to complete requests 
larger than 16 or 8 bits. A 32-bit transfer is converted into two 16-bit transfers (or 3 transfers if 
the data is misaligned) when BS16# is asserted. Asserting BS8# converts a 32-bit transfer into 
four 8-bit transfers. 


Extra cycles forced by BS16# or BS8# should be viewed as independent bus cycles. BS 16# or 
BS8# must be asserted during each of the extra cycles unless the addressed device has the ability 
to change the number of bytes it can return between cycles. 


The Intel486 processor drives the byte enables appropriately during extra cycles forced by BS8# 
and BS16#. A[31:2] does not change if accesses are to a 32-bit aligned area. Table 10-3 shows 
the set of byte enables that is generated on the next cycle for each of the valid possibilities of the 
byte enables on the current cycle. 


The dynamic bus sizing feature of the Intel486 processor is significantly different than that of the 
Intel386™ processor. Unlike the Intel386 processor, the Intel486 processor requires that data 
bytes be driven on the addressed data pins. The simplest example of this function is a 32-bit 
aligned, BS 16# read. When the Intel486 processor reads the two high order bytes, they must be 
driven on the data bus pins D[31:16]. The Intel486 processor expects the two low order bytes on 
D[15:0]. The Intel386 processor expects both the high and low order bytes on D[15:0]. The 
Intel386 processor always reads or writes data on the lower 16 bits of the data bus when BS16# 
is asserted. 


The external system must contain buffers to enable the Intel486 processor to read and write data 
on the appropriate data bus pins. Table 10-4 shows the data bus lines to which the Intel486 pro- 
cessor expects data to be returned for each valid combination of byte enables and bus sizing op- 
tions. 
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Table 10-3. Next Byte Enable Values for BSx# Cycles 


Current Next with Next with BS16# 


BE3# | BE2# | BE1# | BEO# | BE3# | BE2# | BE1# | BEO# | BE3# | BE2# | BE1# | BEO# 
1 1 1 0 N N N N N N N N 


NOTE: “N’ means that another bus cycle is not required to satisfy the request. 


Table 10-4. Data Pins Read with Different Bus Sizes 


BE3# BE2# BE1# BEO# w/o BS8#/BS16# w BS8# w BS16# 
1 1 1 0 D[7:0] DI7:0] D[7:0] 
D[15:0] DI7:0] D[15:0] 
D[23:0] DI7:0] D[15:0] 
D[31:0] DI7:0] D[15:0] 
D[15:8] D[15:8] D[15:8] 
D[23:8] D[15:8] D[15:8] 
D[31:8] D[15:8] D[15:8] 
D[23:16] D[23:16] D[23:16] 
D[31:16] D[23:16] D[31:16] 
D[31:24] D[31:24] D[31:24] 


Valid data is only driven onto data bus pins corresponding to asserted byte enables during write 
cycles. Other pins in the data bus are driven but they contain no valid data. Unlike the Intel386 
processor, the Intel486 processor does not duplicate write data onto parts of the data bus for 
which the corresponding byte enable is deasserted. 
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10.1.3 Interfacing with 8-, 16-, and 32-Bit Memories 


In 32-bit physical memories, such as the one shown in Figure 10-3, each 4-byte word begins at a 
byte address that is a multiple of four. A[31:2] are used as a 4-byte word select. BE[3:0]# select 
individual bytes within the 4-byte word. BS8# and BS16# are deasserted for all bus cycles involv- 
ing the 32-bit array. 


For 16- and 8-bit memories, byte swapping logic is required for routing data to the appropriate 
data lines and logic is required for generating BHE#, BLE# and A1. In systems where mixed 
memory widths are used, extra address decoding logic is necessary to assert BS16# or BS8#. 


Data Bus (D[31:0]) 


Intel486™ 
Processor 


Address Bus 
(BE[3:0]#, A[31:2]) 


“HIGH” “HIGH” 


Figure 10-3. Intel486™ Processor with 32-Bit Memory 


Figure 10-4 shows the Intel486 processor address bus interface to 32-, 16- and 8-bit memories. 
To address 16-bit memories the byte enables must be decoded to produce Al, BHE# and BLE# 
(AO). For 8-bit wide memories the byte enables must be decoded to produce AO and Al. The 
same byte select logic can be used in 16- and 8-bit systems, because BLE# is exactly the same as 
AO (see Table 10-5). 


10-6 | 


intel. BUS OPERATION 


Intel486™ Address Bus (A[31:2], BE[3:0]#) 
Processor 


BS8# BS16# 


Address 
Decode 


32-Bit Memory 
16-Bit Memory 


8-Bit Memory 


BE[3:0]# can be decoded as shown in Table 10-5. The byte select logic necessary to generate 
BHE# and BLE# is shown in Figure 10-5. 


AI31:2] 


Byte 
Select Logic 


A(31:2] 


BHE#, BLE#, A1 


Figure 10-4. Addressing 16- and 8-Bit Memories 
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Table 10-5. Generating A1, BHE# and BLE# for Addressing 16-Bit Devices 


Intel486™ Processor 


8-, 16-Bit Bus Signals 


Comments 
BE3# | BE2# | BE1# | BEO# At? BHE#? | BLE# (AO)! 

1 xt 47 it x Xx Xx x-no asserted bytes 
1 1 1 0) 0 1 0 

1 1 0 1 0 0 i 

1 1 0 0 0 0 0 

1 0 1 1 ; : ue 

qt ot it ot x x x x-not contiguous bytes 
1 1 0 ) 1 

1 0 0 0 0 

0 1 1 1 1 0 1 

ot it it ot x x x x-not contiguous bytes 
of qf of it x X x x-not contiguous bytes 
of if ot ot x X x x-not contiguous bytes 
0 1 1 1 0 0 

of ot qt ot x x x x-not contiguous bytes 
0 0) 0 1 0 0 1 

0 0 0 0 0 0 0 

NOTES: 


1. BLE# asserted when D[7:0] of 16-bit bus is asserted. 
2. BHE# asserted when D[15:8] of 16-bit bus is asserted. 
3. A‘ low for all even words; A1 high for all odd words. 


KEY: 
x=  don'tcare 


+= anon-occurring pattern of byte enables; either none are asserted or the pattern has byte 
enables asserted for non-contiguous bytes 
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BEO# 
Al 
BE1# 
240950—42 
BE1# 
BE3# 
240950-43 


BEO# 


BLE# (OR AO) 


240950—44 


Figure 10-5. Logic to Generate A1, BHE# and BLE# for 16-Bit Buses 


Combinations of BE[3:0]# that never occur are those in which two or three asserted byte enables 
are separated by one or more deasserted byte enables. These combinations are “don't care” con- 
ditions in the decoder. A decoder can use the non-occurring BE[3:0]# combinations to its best 
advantage. 


Figure 10-6 shows an Intel486 processor data bus interface to 16- and 8-bit wide memories. Ex- 
ternal byte swapping logic is needed on the data lines so that data is supplied to and received from 
the Intel486 processor on the correct data pins (see Table 10-4). 
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D[7:0] 8 
Intel486™ his 2 
Processor Di23:16 8 
DI31:24 8 


Byte Swap 
Logic 
les Byte Swap 
Logic 


Address 
Decode 


Figure 10-6. Data Bus Interface to 16- and 8-Bit Memories 


10.1.4 Dynamic Bus Sizing during Cache Line Files 


BS8# and BS 16# can be driven during cache line fills. The Intel486 processor generates enough 
8- or 16-bit cycles to fill the cache line. This can be up to sixteen 8-bit cycles. 


The external system should assume that all byte enables are asserted for the first cycle of a cache 
line fill. The Intel486 processor generates proper byte enables for subsequent cycles in the line 
fill. Table 10-6 shows the appropriate AO (BLE#), Al and BHE# for the various combinations of 
the Intel486 processor byte enables on both the first and subsequent cycles of the cache line fill. 
The “*” marks all combinations of byte enables that are generated by the Intel486 processor dur- 
ing a cache line fill. 
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Table 10-6. Generating AO, A1 and BHE# from the Intel486™ Processor Byte Enables 


First Cache Fill Cycle Any Other Cycle 
BE3# BE2# BE1# BEO# 
AO Al BHE# AO Al BHE# 
1 1 1 0 ) 0 0 0 0 1 
1 1 0 0 0 0 0 0 0 0 
1 0 0 0 0 ) ) 0 0 0 
to 0 0 0 0 0 0 0 0 0 
1 1 0 1 0) 0 0 1 0 0 
to 0 0 1 0 0 0 1 0 0 
1 0 1 1 0 0 0 0 1 1 
i) 0 1 1 0 0 0 0 1 0 
to 1 1 1 ) ) ) 1 1 0 
KEY: 


+= anon-occurring pattern of Byte Enables; either none are asserted or the pattern has byte 
enables asserted for non-contiguous bytes 


10.1.5 Operand Alignment 


Physical 4-byte words begin at addresses that are multiples of four. It is possible to transfer a log- 
ical operand that spans more than one physical 4-byte word of memory or I/O at the expense of 
extra cycles. Examples are 4-byte operands beginning at addresses that are not evenly divisible 
by 4, or 2-byte words split between two physical 4-byte words. These are referred to as unaligned 
transfers. 


Operand alignment and data bus size dictate when multiple bus cycles are required. Table 10-7 
describes the transfer cycles generated for all combinations of logical operand lengths, alignment, 
and data bus sizing. When multiple cycles are required to transfer a multibyte logical operand, 
the highest-order bytes are transferred first. For example, when the processor executes a 4-byte 
unaligned read beginning at byte location 11 in the 4-byte aligned space, the three high-order 
bytes are read in the first bus cycle. The low byte is read in a subsequent bus cycle. 
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Table 10-7. Transfer Bus Cycles for Bytes, Words and Dwords 


Byte-Length of Logical Operand 
1 2 4 

Physical Byte Address in XX 00 01 10 11 00 01 10 11 
Memory (Low Order Bits) 
Transfer Cycles over 32-Bit b Ww Ww Ww hb d hb hw h3 
Bus Ib 13 lw Ib 
Transfer Cycles over 16-Bit b w Ib t w hb wt hb hw | mwt 
Bus hb * Ib hw* | Ibt lw hb * 
(* = BS#16 asserted) mw + Ib 
Transfer Cycles over 8-Bit b Ilb* | Ib? | bb? hb Ib # hb | mhb* | mlb? 
Bus hb* | hb? | hb# Ib mib# | Ib? hb* | mhb? 
(? = BS8# Asserted) mhb#] mib# | Ib? hb # 

hb* | mhb* | mlb? Ib 
KEY: 
b = byte transfer h = high-order portion 4-Byte Operand Ib mlb mhb hb 
w = 2-byte transfer | = low-order portion byte with Toyte with 


3 = 3-byte transfer m = mid-order portion 
d = 4-byte transfer 


lowest address _ highest address 
The function of unaligned transfers with dynamic bus sizing is not obvious. When the external 
systems asserts BS16# or BS8#, forcing extra cycles, low-order bytes or words are transferred 
first (opposite to the example above). When the Intel486 processor requests a 4-byte read and the 
external system asserts BS 16#, the lower two bytes are read first followed by the upper two bytes. 


In the unaligned transfer described above, the processor requested three bytes on the first cycle. 
When the external system asserts BS 16# during this 3-byte transfer, the lower word is transferred 
first followed by the upper byte. In the final cycle, the lower byte of the 4-byte operand is trans- 
ferred, as shown in the 32-bit example above. 
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10.2 BUS ARBITRATION LOGIC 


Bus arbitration logic is needed with multiple bus masters. Hardware implementations range from 
single-master designs to those with multiple masters and DMA devices. 


Figure 10-7 shows a simple system in which only one master controls the bus and accesses the 
memory and I/O devices. Here, no arbitration is required. 


Intel486™ 
Processor 


Address Bus 


Control Bus 


Figure 10-7. Single Master Intel486™ Processor System 
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Figure 10-8 shows a single processor and a DMA device. Here, arbitration is required to deter- 
mine whether the processor, which acts as a master most of the time, or a DMA controller has 
control of the bus. When the DMA wants control of the bus, it asserts the HOLD request to the 
processor. The processor then responds with a HLDA output when it is ready to relinquish bus 
control to the DMA device. Once the DMA device completes its bus activity cycles, it negates 
the HOLD signal to relinquish the bus and return control to the processor. 


Intel486™ 
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Control Bus 


Figure 10-8. Single Intel486™ Processor with DMA 
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Figure 10-9 shows more than one primary bus master and two secondary masters, and the arbi- 
tration logic is more complex. The arbitration logic resolves bus contention by ensuring that all 
device requests are serviced one at a time using either a fixed or a rotating scheme. The arbitration 
logic then passes information to the Intel486 processor, which ultimately releases the bus. The 
arbitration logic receives bus control status information via the HOLD and HLDA signals and re- 
lays it to the requesting devices. 


Arbitration 
Logic 


DRQ 
Intel486™ 
Processor 


DACK 


Address Bus 


Data Bus 


Control Bus 


Figure 10-9. Single Intel486™ Processor with Multiple Secondary Masters 


As systems become more complex and include multiple bus masters, hardware must be added to 
arbitrate and assign the management of bus time to each master. The second master may be a 
DMA controller that requires bus time to perform memory transfers or it may be a second pro- 
cessor that requires the bus to perform memory or I/O cycles. Any of these devices may act as a 
bus master. The arbitration logic must assign only one bus master at a time so that there is no con- 
tention between devices when accessing main memory. 
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The arbitration logic may be implemented in several different ways. The first technique is to 
“round-robin” or to “time slice” each master. Each master is given a block of time on the bus to 
match their priority and need for the bus. 


Another method of arbitration is to assign the bus to a master when the bus is needed. Assigning 
the bus requires the arbitration logic to sample the BREQ or HOLD outputs from the potential 
masters and to assign the bus to the requestor. A priority scheme must be included to handle cases 
where more than one device is requesting the bus. The arbitration logic must assert HOLD to the 
device that must relinquish the bus. Once HLDA is asserted by all of these devices, the arbitration 
logic may assert HLDA or BACK# to the device requesting the bus. The requestor remains the 
bus master until another device needs the bus. 


These two arbitration techniques can be combined to create a more elaborate arbitration scheme 
that is driven by a device that needs the bus but guarantees that every device gets time on the bus. 
It is important that an arbitration scheme be selected to best fit the needs of each system's imple- 
mentation. 


The Intel486 processor asserts BREQ when it requires control of the bus. BREQ notifies the ar- 
bitration logic that the processor has pending bus activity and requests the bus. When its HOLD 
input is inactive and its HLDA signal is deasserted, the Intel486 processor can acquire the bus. 
Otherwise if HOLD is asserted, then the Intel486 processor has to wait for HOLD to be deassert- 
ed before acquiring the bus. If the Intel486 processor does not have the bus, then its address, data, 
and status pins are 3-stated. However, the processor can execute instructions out of the internal 
cache or instruction queue, and does not need control of the bus to remain active. 


The address buses shown in Figure 10-8 and Figure 10-9 are bidirectional to allow cache invali- 
dations to the processors during memory writes on the bus. 


10.3 BUS FUNCTIONAL DESCRIPTION 


The Intel486 processor supports a wide variety of bus transfers to meet the needs of high perfor- 
mance systems. Bus transfers can be single cycle or multiple cycle, burst or non-burst, cacheable 
or non-cacheable, 8-, 16- or 32-bit, and pseudo-locked. Cache invalidation cycles and locked cy- 
cles provide support for multiprocessor systems. 


This section explains basic non-cacheable, non-burst single cycle transfers. It also details multi- 
ple cycle transfers and introduces the burst mode. Cacheability is introduced in Section 10.3.3, 
“Cacheable Cycles.” The remaining sections describe locked, pseudo-locked, invalidate, bus 
hold, and interrupt cycles. 


Bus cycles and data cycles are discussed in this section. A bus cycle is at least two clocks long 
and begins with ADS# asserted in the first clock and RDY# or BRDY# asserted in the last clock. 
Data is transferred to or from the Intel486 processor during a data cycle. A bus cycle contains one 
or more data cycles. 


Refer to Section 10.3.13, “Bus States,” for a description of the bus states shown in the timing di- 
agrams. 
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10.3.1 Non-Cacheable Non-Burst Single Cycles 


10.3.1.1 No Wait States 


The fastest non-burst bus cycle that the Intel486 processor supports is two clocks. These cycles 
are called 2-2 cycles because reads and writes take two cycles each. The first “2” refers to reads 
and the second “2” to writes. If a wait state needs to be added to the write, the cycle is called 
937 

Basic two-clock read and write cycles are shown in Figure 10-10. The Intel486 processor initiates 
a cycle by asserting the address status signal (ADS#) at the rising edge of the first clock. The 


ADS# output indicates that a valid bus cycle definition and address is available on the cycle def- 
inition lines and address bus. 
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242202-031 


Figure 10-10. Basic 2-2 Bus Cycle 


The non-burst ready input (RDY*#) is asserted by the external system in the second clock. RDY# 
indicates that the external system has presented valid data on the data pins in response to a read 
or the external system has accepted data in response to a write. 


The Intel486 processor samples RDY# at the end of the second clock. The cycle is complete if 
RDY# is asserted (LOW) when sampled. Note that RD Y# is ignored at the end of the first clock 
of the bus cycle. 
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The burst last signal (BLAST#) is asserted (LOW) by the Intel486 processor during the second 
clock of the first cycle in all bus transfers illustrated in Figure 10-10. This indicates that each 
transfer is complete after a single cycle. The Intel486 processor asserts BLAST# in the last cycle, 
“T2”, of a bus transfer. 


The timing of the parity check output (PCHK#) is shown in Figure 10-10. The Intel486 processor 
drives the PCHK# output one clock after RDY# or BRD Y# terminates a read cycle. PCHK# in- 
dicates the parity status for the data sampled at the end of the previous clock. The PCHK# signal 
can be used by the external system. The Intel486 processor does nothing in response to the 
PCHK# output. 


10.3.1.2 Inserting Wait States 


The external system can insert wait states into the basic 2-2 cycle by deasserting RD Y# at the end 
of the second clock. RDY# must be deasserted to insert a wait state. Figure 10-11 illustrates a 
simple non-burst, non-cacheable signal with one wait state added. Any number of wait states can 
be added to an Intel486 processor bus cycle by maintaining RD Y# deasserted. 
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Figure 10-11. Basic 3-3 Bus Cycle 


The burst ready input (BRD Y#) must be deasserted on all clock edges where RDY# is deasserted 
for proper operation of these simple non-burst cycles. 
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10.3.2 Multiple and Burst Cycle Bus Transfers 


Multiple cycle bus transfers can be caused by internal requests from the Intel486 processor or by 
the external memory system. An internal request for a 128-bit pre-fetch requires more than one 
cycle. Internal requests for unaligned data may also require multiple bus cycles. A cache line fill 
requires multiple cycles to complete. 


The external system can cause a multiple cycle transfer when it can only supply 8- or 16-bits per 
cycle. 


Only multiple cycle transfers caused by internal requests are considered in this section. Cache- 
able cycles and 8- and 16-bit transfers are covered in Section 10.3.3, “Cacheable Cycles,” and 
Section 10.3.5, “8- and 16-Bit Cycles.” 


An internal request by an IntelDX2 or IntelD X4 processor for a 64-bit floating-point load must 
take more than one internal cycle. 


10.3.2.1 Burst Cycles 


The Intel486 processor can accept burst cycles for any bus requests that require more than a single 
data cycle. During burst cycles, a new data item is strobed into the Intel486 processor every clock 
rather than every other clock as in non-burst cycles. The fastest burst cycle requires two clocks 
for the first data item, with subsequent data items returned every clock. 


The Intel486 processor is capable of bursting a maximum of 32 bits during a write. Burst writes 
can only occur if BS8# or BS 16# is asserted. For example, the Intel486 processor can burst write 
four 8-bit operands or two 16-bit operands in a single burst cycle. But the Intel486 processor can- 
not burst multiple 32-bit writes in a single burst cycle. 


Burst cycles begin with the Intel486 processor driving out an address and asserting ADS# in the 
same manner as non-burst cycles. The Intel486 processor indicates that it is willing to perform a 
burst cycle by holding the burst last signal (BLAST#) deasserted in the second clock of the cycle. 
The external system indicates its willingness to do a burst cycle by asserting the burst ready signal 
(BRDY#). 


The addresses of the data items in a burst cycle all fall within the same 16-byte aligned area (cor- 
responding to an internal Intel486 processor cache line). A 16-byte aligned area begins at location 
XXXXXXXO and ends at location XXXXXXXF. During a burst cycle, only BE[3:0]#, A2, and 
A3 may change. A[31:4], M/IO#, D/C#, and W/R# remain stable throughout a burst. Given the 
first address in a burst, external hardware can easily calculate the address of subsequent transfers 
in advance. An external memory system can be designed to quickly fill the Intel486 processor 
internal cache lines. 


Burst cycles are not limited to cache line fills. Any multiple cycle read request by the Intel486 
processor can be converted into a burst cycle. The Intel486 processor only bursts the number of 
bytes needed to complete a transfer. For example, the IntelDX2 and Write-Back Enhanced 
IntelDX4 processors burst eight bytes for a 64-bit floating-point non-cacheable read. 
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The external system converts a multiple cycle request into a burst cycle by asserting BRD Y# rath- 
er than RDY# (non-burst ready) in the first cycle of a transfer. For cycles that cannot be burst, 
such as interrupt acknowledge and halt, BRD Y# has the same effect as RDY#. BRD Y# is ignored 
if both BRDY# and RDY# are asserted in the same clock. Memory areas and peripheral devices 
that cannot perform bursting must terminate cycles with RDY#. 


10.3.2.2. Terminating Multiple and Burst Cycle Transfers 


The Intel486 processor deasserts BLAST# for all but the last cycle in a multiple cycle transfer. 
BLAST# is deasserted in the first cycle to inform the external system that the transfer could take 
additional cycles. BLAST# is asserted in the last cycle of the transfer to indicate that the next time 
BRDY# or RDY# is asserted the transfer is complete. 


BLAST# is not valid in the first clock of a bus cycle. It should be sampled only in the second and 
subsequent clocks when RDY# or BRDY# is asserted. 


The number of cycles in a transfer is a function of several factors including the number of bytes 
the Intel486 processor needs to complete an internal request (1, 2, 4, 8, or 16), the state of the bus 
size inputs (BS8# and BS 16#), the state of the cache enable input (KEN#) and the alignment of 
the data to be transferred. 


When the Intel486 processor initiates a request, it knows how many bytes are transferred and if 
the data is aligned. The external system must indicate whether the data is cacheable (if the transfer 
is a read) and the width of the bus by returning the state of the KEN#, BS8# and BS16# inputs 
one clock before RDY# or BRDY# is asserted. The Intel486 processor determines how many cy- 
cles a transfer will take based on its internal information and inputs from the external system. 


BLAST# is not valid in the first clock of a bus cycle because the Intel486 processor cannot de- 
termine the number of cycles a transfer will take until the external system asserts KEN#, BS8# 
and BS16#. BLAST# should only be sampled in the second T2 state and subsequent T2 states of 
a cycle when the external system asserts RDY# or BRDY#. 


The system may terminate a burst cycle by asserting RDY# instead of BRDY#. BLAST# remains 
deasserted until the last transfer. However, any transfers required to complete a cache line fill fol- 
low the burst order; for example, if burst order was 4, 0, C, 8 and RDY# was asserted after 0, the 
next transfers are from C and 8. 


10.3.2.3. Non-Cacheable, Non-Burst, Multiple Cycle Transfers 


Figure 10-12 illustrates a two-cycle, non-burst, non-cacheable read. This transfer is simply a se- 
quence of two single cycle transfers. The Intel486 processor indicates to the external system that 
this is a multiple cycle transfer by deasserting BLAST# during the second clock of the first cycle. 
The external system asserts RDY# to indicate that it will not burst the data. The external system 
also indicates that the data is not cacheable by deasserting KEN# one clock before it asserts 
RDY#. When the Intel486 processor samples RDY# asserted, it ignores BRD Y#. 
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Figure 10-12. Non-Cacheable, Non-Burst, Multiple-Cycle Transfers 


Each cycle in the transfer begins when ADS# is asserted and the cycle is complete when the ex- 
ternal system asserts RDY#. 


The Intel486 processor indicates the last cycle of the transfer by asserting BLAST#. The next 
RDY# asserted by the external system terminates the transfer. 


10.3.2.4 | Non-Cacheable Burst Cycles 


The external system converts a multiple cycle request into a burst cycle by asserting BRD Y# rath- 
er than RDY# in the first cycle of the transfer. This is illustrated in Figure 10-13. 


There are several features to note in the burst read. ADS# is asserted only during the first cycle 
of the transfer. RDY# must be deasserted when BRDY?# is asserted. 


BLAST# behaves exactly as it does in the non-burst read. BLAST# is deasserted in the second 
clock of the first cycle of the transfer, indicating more cycles to follow. In the last cycle, BLAST# 
is asserted, prompting the external memory system to end the burst after asserting the next 
BRDY#. 
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Figure 10-13. Non-Cacheable Burst Cycle 


10.3.3 Cacheable Cycles 


Any memory read can become a cache fill operation. The external memory system can allow a 
read request to fill a cache line by asserting KEN# one clock before RDY# or BRD Y# during the 
first cycle of the transfer on the external bus. Once KEN# is asserted and the remaining three re- 
quirements described below are met, the Intel486 processor fetches an entire cache line regard- 
less of the state of KEN#. KEN# must be asserted in the last cycle of the transfer for the data to 
be written into the internal cache. The Intel486 processor converts only memory reads or 
prefetches into a cache fill. 


KEN# is ignored during write or I/O cycles. Memory writes are stored only in the on-chip cache 
if there is a cache hit. I/O space is never cached in the internal cache. 
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To transform a read or a prefetch into a cache line fill, the following conditions must be met: 


1. The KEN# pin must be asserted one clock prior to RDY# or BRD Y# being asserted for the 
first data cycle. 


2. The cycle must be of a type that can be internally cached. (Locked reads, I/O reads, and 
interrupt acknowledge cycles are never cached.) 


3. The page table entry must have the page cache disable bit (PCD) set to 0. To cache a page 
table entry, the page directory must have PCD=0. To cache reads or prefetches when 
paging is disabled, or to cache the page directory entry, control register 3 (CR3) must have 
PCD=0. 


4. The cache disable (CD) bit in control register 0 (CRO) must be clear. 


External hardware can determine when the Intel486 processor has transformed a read or prefetch 
into a cache fill by examining the KEN#, M/IO#, D/C#, W/R#, LOCK#, and PCD pins. These 
pins convey to the system the outcome of conditions 1-3 in the above list. In addition, the 
Intel486 processor drives PCD high whenever the CD bit in CRO is set, so that external hardware 
can evaluate condition 4. 


Cacheable cycles can be burst or non-burst. 


10.3.3.1 Byte Enables during a Cache Line Fill 


For the first cycle in the line fill, the state of the byte enables should be ignored. In a non-cache- 
able memory read, the byte enables indicate the bytes actually required by the memory or code 
fetch. 


The Intel486 processor expects to receive valid data on its entire bus (32 bits) in the first cycle of 
a cache line fill. Data should be returned with the assumption that all the byte enable pins are as- 
serted. However if BS8# is asserted, only one byte should be returned on data lines D[7:0]. Sim- 
ilarly if BS16# is asserted, two bytes should be returned on D[15:0]. 


The Intel486 processor generates the addresses and byte enables for all subsequent cycles in the 
line fill. The order in which data is read during a line fill depends on the address of the first item 
read. Byte ordering is discussed in Section 10.3.4, “Burst Mode Details.” 
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10.3.3.2 | Non-Burst Cacheable Cycles 


Figure 10-14 shows a non-burst cacheable cycle. The cycle becomes a cache fill when the 
Intel486 processor samples KEN# asserted at the end of the first clock. The Intel486 processor 
deasserts BLAST# in the second clock in response to KEN#. BLAST# is deasserted because a 
cache fill requires three additional cycles to complete. BLAST# remains deasserted until the last 
transfer in the cache line fill. KEN# must be asserted in the last cycle of the transfer for the data 
to be written into the internal cache. 


Note that this cycle would be a single bus cycle if KEN# was not sampled asserted at the end of 
the first clock. The subsequent three reads would not have happened since a cache fill was not 
requested. 


The BLAST# output is invalid in the first clock of a cycle. BLAST# may be asserted during the 
first clock due to earlier inputs. Ignore BLAST# until the second clock. 


During the first cycle of the cache line fill the external system should treat the byte enables as if 
they are all asserted. In subsequent cycles in the burst, the Intel486 processor drives the address 
lines and byte enables. (See Section 10.3.4.2, “Burst and Cache Line Fill Order.’’) 
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Figure 10-14. Non-Burst, Cacheable Cycles 
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10.3.3.3. Burst Cacheable Cycles 


Figure 10-15 illustrates a burst mode cache fill. As in Figure 10-14, the transfer becomes a cache 
line fill when the external system asserts KEN# at the end of the first clock in the cycle. 


The external system informs the Intel486 processor that it will burst the line in by asserting 
BRDY# at the end of the first cycle in the transfer. 


Note that during a burst cycle, ADS# is only driven with the first address. 
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Figure 10-15. Burst Cacheable Cycle 
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10.3.3.4 Effect of Changing KEN# during a Cache Line Fill 


KEN# can change multiple times as long as it arrives at its final value in the clock before RDY# 
or BRDY# is asserted. This is illustrated in Figure 10-16. Note that the timing of BLAST# fol- 
lows that of KEN# by one clock. The Intel486 processor samples KEN# every clock and uses the 
value returned in the clock before BRDY# or RDY# to determine if a bus cycle would be a cache 
line fill. Similarly, it uses the value of KEN# in the last cycle before early RD Y# to load the line 
just retrieved from memory into the cache. KEN# is sampled every clock and it must satisfy setup 
and hold times. 


KEN# can also change multiple times before a burst cycle, as long as it arrives at its final value 
one clock before BRD Y# or RDY# is asserted. 
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Figure 10-16. Effect of Changing KEN# 


10-26 | 


intel. BUS OPERATION 


10.3.4 Burst Mode Details 


10.3.4.1. Adding Wait States to Burst Cycles 


Burst cycles need not return data on every clock. The Intel486 processor strobes data into the chip 
only when either RDY# or BRD Y# is asserted. Deasserting BRDY# and RDY# adds a wait state 
to the transfer. A burst cycle where two clocks are required for every burst item is shown in 
Figure 10-17. 
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Figure 10-17. Slow Burst Cycle 
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10.3.4.2 Burst and Cache Line Fill Order 


The burst order used by the Intel486 processor is shown in Table 10-8. This burst order is fol- 
lowed by any burst cycle (cache or not), cache line fill (burst or not) or code prefetch. 


The Intel486 processor presents each request for data in an order determined by the first address 
in the transfer. For example, if the first address was 104 the next three addresses in the burst will 
be 100, 10C and 108. An example of burst address sequencing is shown in Figure 10-18. 


Table 10-8. Burst Order (Both Read and Write Bursts) 


First Address Second Address | Third Address | Fourth Address 
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Figure 10-18. Burst Cycle Showing Order of Addresses 
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The sequences shown in Table 10-8 accommodate systems with 64-bit buses as well as systems 
with 32-bit data buses. The sequence applies to all bursts, regardless of whether the purpose of 
the burst is to fill a cache line, perform a 64-bit read, or perform a pre-fetch. If either BS8# or 
BS 16# is asserted, the Intel486 processor completes the transfer of the current 32-bit word before 
progressing to the next 32-bit word. For example, a BS16# burst to address 4 has the following 
order: 4-6-0-2-C-E-8-A. 


10.3.4.3 Interrupted Burst Cycles 


Some memory systems may not be able to respond with burst cycles in the order defined in Table 
10-8. To support these systems, the Intel486 processor allows a burst cycle to be interrupted at 
any time. The Intel486 processor automatically generates another normal bus cycle after being 
interrupted to complete the data transfer. This is called an interrupted burst cycle. The external 
system can respond to an interrupted burst cycle with another burst cycle. 


The external system can interrupt a burst cycle by asserting RD Y# instead of BRDY#. RDY# can 
be asserted after any number of data cycles terminated with BRDY#. 


An example of an interrupted burst cycle is shown in Figure 10-19. The Intel486 processor im- 
mediately asserts ADS# to initiate a new bus cycle after RDY# is asserted. BLAST# is deasserted 
one clock after ADS# begins the second bus cycle, indicating that the transfer is not complete. 


CLK 


ADS# 


A31—-A2 


RDY# 


BRDY# 


KEN# 


BLAST# 


DATA 


+ To Processor 


242202-067 


Figure 10-19. Interrupted Burst Cycle 
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KEN+# need not be asserted in the first data cycle of the second part of the transfer shown in 
Figure 10-20. The cycle had been converted to a cache fill in the first part of the transfer and the 
Intel486 processor expects the cache fill to be completed. Note that the first half and second half 
of the transfer in Figure 10-19 are both two-cycle burst transfers. 


The order in which the Intel486 processor requests operands during an interrupted burst transfer 
is shown by Table 10-7 on page 10-12. Mixing RD Y# and BRDY# does not change the order in 
which operand addresses are requested by the Intel486 processor. 


An example of the order in which the Intel486 processor requests operands during a cycle in 
which the external system mixes RDY# and BRD Y# is shown in Figure 10-20. The Intel486 pro- 
cessor initially requests a transfer beginning at location 104. The transfer becomes a cache line 
fill when the external system asserts KEN#. The first cycle of the cache fill transfers the contents 
of location 104 and is terminated with RDY#. The Intel486 processor drives out a new request 
(by asserting ADS#) to address 100. If the external system terminates the second cycle with 
BRDY#, the Intel486 processor next requests/expects address 10C. The correct order is deter- 
mined by the first cycle in the transfer, which may not be the first cycle in the burst if the system 
mixes RDY# with BRDY#. 
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Figure 10-20. Interrupted Burst Cycle with Non-Obvious Order of Addresses 


10-30 | 


intel. BUS OPERATION 


10.3.5 8- and 16-Bit Cycles 


The Intel486 processor supports both 16- and 8-bit external buses through the BS16# and BS8# 
inputs. BS 16# and BS8# allow the external system to specify, on a cycle-by-cycle basis, whether 
the addressed component can supply 8, 16 or 32 bits. BS 16# and BS8# can be used in burst cycles 
as well as non-burst cycles. If both BS16# and BS8# are asserted for any bus cycle, the Intel486 
processor responds as if only BS8# is asserted. 


The timing of BS16# and BS8# is the same as that of KEN#. BS16# and BS8# must be asserted 
before the first RDY# or BRDY# is asserted. Asserting BS16# and BS8# can force the Intel486 
processor to run additional cycles to complete what would have been only a single 32-bit cycle. 
BS8# and BS16# may change the state of BLAST# when they force subsequent cycles from the 
transfer. 


Figure 10-21 shows an example in which BS8# forces the Intel486 processor to run two extra cy- 
cles to complete a transfer. The Intel486 processor issues a request for 24 bits of information. The 
external system asserts BS8#, indicating that only eight bits of data can be supplied per cycle. The 
Intel486 processor issues two extra cycles to complete the transfer. 
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Figure 10-21. 8-Bit Bus Size Cycle 


Extra cycles forced by BS16# and BS8# signals should be viewed as independent bus cycles. 
BS16# and BS8# should be asserted for each additional cycle unless the addressed device can 
change the number of bytes it can return between cycles. The Intel486 processor deasserts 
BLAST# until the last cycle before the transfer is complete. 
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Refer to Section 10.1.2, “Dynamic Data Bus Sizing,” for the sequencing of addresses when BS8# 
or BS16# are asserted. 


During burst cycles, BS8# and BS 16# operate in the same manner as during non-burst cycles. For 
example, a single non-cacheable read could be transferred by the Intel486 processor as four 8-bit 
burst data cycles. Similarly, a single 32-bit write could be written as four 8-bit burst data cycles. 
An example of a burst write is shown in Figure 10-22. Burst writes can only occur if BS8# or 
BS 16# is asserted. 
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Figure 10-22. Burst Write as a Result of BS8# or BS16# 


10.3.6 Locked Cycles 


Locked cycles are generated in software for any instruction that performs a read-modify-write op- 
eration. During a read-modify-write operation, the Intel486 processor can read and modify a vari- 
able in external memory and ensure that the variable is not accessed between the read and write. 


Locked cycles are automatically generated during certain bus transfers. The XCHG (exchange) 
instruction generates a locked cycle when one of its operands is memory-based. Locked cycles 
are generated when a segment or page table entry is updated and during interrupt acknowledge 
cycles. Locked cycles are also generated when the LOCK instruction prefix is used with selected 
instructions. 
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Locked cycles are implemented in hardware with the LOCK# pin. When LOCK# is asserted, the 
Intel486 processor is performing a read-modify-write operation and the external bus should not 
be relinquished until the cycle is complete. Multiple reads or writes can be locked. A locked cycle 
is shown in Figure 10-23. LOCK# is asserted with the address and bus definition pins at the be- 
ginning of the first read cycle and remains asserted until RDY# is asserted for the last write cycle. 
For unaligned 32-bit read-modify-write operations, the LOCK# remains asserted for the entire 
duration of the multiple cycle. It deasserts when RDY# is asserted for the last write cycle. 


When LOCK# is asserted, the Intel486 processor recognizes address hold and backoff but does 
not recognize bus hold. It is left to the external system to properly arbitrate a central bus when the 
Intel486 processor generates LOCK#. 
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Figure 10-23. Locked Bus Cycle 


10.3.7 Pseudo-Locked Cycles 


Pseudo-locked cycles assure that no other master is given control of the bus during operand trans- 
fers that take more than one bus cycle. 


For the Intel486 processor, examples include 64-bit description loads and cache line fills. 


Pseudo-locked transfers are indicated by the PLOCK# pin. The memory operands must be 
aligned for correct operation of a pseudo-locked cycle. 


PLOCK# need not be examined during burst reads. A 64-bit aligned operand can be retrieved in 
one burst (note that this is only valid in systems that do not interrupt bursts). 


| 10-33 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


The system must examine PLOCK# during 64-bit writes since the Intel486 processor cannot 
burst write more than 32 bits. However, burst can be used within each 32-bit write cycle if BS8# 
or BS16# is asserted. BLAST is de-asserted in response to BS8# or BS16#. A 64-bit write is driv- 
en out as two non-burst bus cycles. BLAST# is asserted during both 32-bit writes, because a burst 
is not possible. PLOCK# is asserted during the first write to indicate that another write follows. 
This behavior is shown in Figure 10-24. 


The first cycle of a 64-bit floating-point write is the only case in which both PLOCK# and 
BLAST# are asserted. Normally PLOCK# and BLAST# are the inverse of each other. 


During all of the cycles in which PLOCK# is asserted, HOLD is not acknowledged until the cycle 
completes. This results in a large HOLD latency, especially when BS8# or BS16# is asserted. To 
reduce the HOLD latency during these cycles, windows are available between transfers to allow 
HOLD to be acknowledged during non-cacheable code prefetches. PLOCK# is asserted because 
BLAST# is deasserted, but PLOCK# is ignored and HOLD is recognized during the prefetch. 


PLOCK# can change several times during a cycle, settling to its final value in the clock in which 
RDY# is asserted. 


10.3.7.1 Floating-Point Read and Write Cycles 


For IntelDX2 and Write-Back Enhanced IntelDX4 processors, 64-bit floating-point read and 
write cycles are also examples of operand transfers that take more than one bus cycle. 


CLK 


ADS# 


A31-A2 
M/lO# 

D/C# 
BE3#-BE0# 


W/R# 


PLOCK# 


BLAST# 


DATA 


+ From Processor 


242202-144 


Figure 10-24. Pseudo Lock Timing 
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10.3.8 Invalidate Cycles 


Invalidate cycles keep the Intel486 processor internal cache contents consistent with external 
memory. The Intel486 processor contains a mechanism for monitoring writes by other devices to 
external memory. When the Intel486 processor finds a write to a section of external memory con- 
tained in its internal cache, the Intel486 processor’s internal copy is invalidated. 


Invalidations use two pins, address hold request (AHOLD) and valid external address (EADS#). 
There are two steps in an invalidation cycle. First, the external system asserts the AHOLD input 
forcing the Intel486 processor to immediately relinquish its address bus. Next, the external sys- 
tem asserts EADS#, indicating that a valid address is on the Intel486 processor address bus. 
Figure 10-25 shows the fastest possible invalidation cycle. The Intel486 processor recognizes 
AHOLD on one CLK edge and floats the address bus in response. To allow the address bus to 
float and avoid contention, EADS# and the invalidation address should not be driven until the 
following CLK edge. The Intel486 processor reads the address over its address lines. If the 
Intel486 processor finds this address in its internal cache, the cache entry is invalidated. Note that 
the Intel486 processor address bus is input/output, unlike the Intel386 processor’s bus, which is 
output only. 
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Figure 10-25. Fast Internal Cache Invalidation Cycle 
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Figure 10-26. Typical Internal Cache Invalidation Cycle 
10.3.8.1 Rate of Invalidate Cycles 


The Intel486 processor can accept one invalidate per clock except in the last clock of a line fill. 
One invalidate per clock is possible as long as EADS# is deasserted in ONE or BOTH of the fol- 
lowing cases: 


1. Inthe clock in which RDY# or BRDY# is asserted for the last time. 
2. Inthe clock following the clock in which RDY# or BRDY# is asserted for the last time. 


This definition allows two system designs. Simple designs can restrict invalidates to one every 
other clock. The simple design need not track bus activity. Alternatively, systems can request one 
invalidate per clock provided that the bus is monitored. 


10.3.8.2 Running Invalidate Cycles Concurrently with Line Fills 


Precautions are necessary to avoid caching stale data in the Intel486 processor cache in a system 
with a second-level cache. An example of a system with a second-level cache is shown in 
Figure 10-27. 


An external device can write to main memory over the system bus while the Intel486 processor 
is retrieving data from the second-level cache. The Intel486 processor must invalidate a line in its 
internal cache if the external device is writing to a main memory address that is also contained in 
the Intel486 processor cache. 
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A potential problem exists if the external device is writing to an address in external memory, and 
at the same time the Intel486 processor is reading data from the same address in the second-level 
cache. The system must force an invalidation cycle to invalidate the data that the Intel486 pro- 
cessor has requested during the line fill. 


Intel486™ 
Processor 


Address, Data and 
Control Bus 


Second-Level 
Cache 


Address, Data and 
Control Bus 


System Bus 


External External Bus 
Memory Master 


Figure 10-27. System with Second-Level Cache 
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If the system asserts EADS# before the first data in the line fill is returned to the Intel486 proces- 
sor, the system must return data consistent with the new data in the external memory upon re- 
sumption of the line fill after the invalidation cycle. This is illustrated by the asserted EADS# 
signal labeled “1” in Figure 10-28. 


If the system asserts EADS# at the same time or after the first data in the line fill is returned (in 
the same clock that the first RDY# or BRDY# is asserted or any subsequent clock in the line fill) 
the data is read into the Intel486 processor input buffers but it is not stored in the on-chip cache. 
This is illustrated by asserted EADS# signal labeled “2” in Figure 10-28. The stale data is used 
to satisfy the request that initiated the cache fill cycle. 
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NOTES: 
1. Data returned must be consistent if its address equals the invalidation address in this clock. 
2. Data returned is not cached if its address equals the invalidation address in this clock. 242202-093 


Figure 10-28. Cache Invalidation Cycle Concurrent with Line Fill 
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10.3.9 Bus Hold 


The Intel486 processor provides a bus hold, hold acknowledge protocol using the bus hold re- 
quest (HOLD) and bus hold acknowledge (HLDA) pins. Asserting the HOLD input indicates that 
another bus master has requested control of the Intel486 processor bus. The Intel486 processor 
responds by floating its bus and asserting HLDA when the current bus cycle, or sequence of 
locked cycles, is complete. An example of a HOLD/HLDA transaction is shown in Figure 10-29. 
Unlike the Intel386 processor, the Intel486 processor can respond to HOLD by floating its bus 
and asserting HLDA while RESET is asserted. 
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Figure 10-29. HOLD/HLDA Cycles 


Note that HOLD is recognized during un-aligned writes (less than or equal to 32 bits) with 
BLAST# being asserted for each write. For a write greater than 32-bits or an un-aligned write, 
HOLD# recognition is prevented by PLOCK# getting asserted. However, HOLD is recognized 
during non-cacheable, non-burstable code prefetches even though PLOCK+# is asserted. 
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For cacheable and non-burst or burst cycles, HOLD is acknowledged during backoff only if 
HOLD and BOFF# are asserted during an active bus cycle (after ADS# asserted) and before the 
first RDY# or BRDY# has been asserted (see Figure 10-30). The order in which HOLD and 
BOFF# are asserted is unimportant (as long as both are asserted prior to the first RDY#/BRDY# 
asserted by the system). Figure 10-30 shows the case where HOLD is asserted first; HOLD could 
be asserted simultaneously or after BOFF# and still be acknowledged. 


The pins floated during bus hold are: BE[3:0]#, PCD, PWT, W/R#, D/C#, M/O#, LOCK#, 
PLOCK#, ADS#, BLAST#, D[31:0], A[31:2], and DP[3:0]. 
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Figure 10-30. HOLD Request Acknowledged during BOFF# 
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10.3.10 Interrupt Acknowledge 


The Intel486 processor generates interrupt acknowledge cycles in response to maskable interrupt 
requests that are generated on the interrupt request input (INTR) pin. Interrupt acknowledge cy- 
cles have a unique cycle type generated on the cycle type pins. 


An example of an interrupt acknowledge transaction is shown in Figure 10-31. Interrupt ac- 
knowledge cycles are generated in locked pairs. Data returned during the first cycle is ignored. 
The interrupt vector is returned during the second cycle on the lower 8 bits of the data bus. The 
Intel486 processor has 256 possible interrupt vectors. 


The state of A2 distinguishes the first and second interrupt acknowledge cycles. The byte address 
driven during the first interrupt acknowledge cycle is 4 (A[31:3] low, A2 high, BE[3:1]# high, 
and BEO# low). The address driven during the second interrupt acknowledge cycle is 0 (A[31:2] 
low, BE[3:1]# high, BEO# low). 


Each of the interrupt acknowledge cycles is terminated when the external system asserts RDY# 
or BRDY#. Wait states can be added by holding RDY# or BRDY# deasserted. The Intel486 pro- 
cessor automatically generates four idle clocks between the first and second cycles to allow for 
8259A recovery time. 
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Figure 10-31. Interrupt Acknowledge Cycles 
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10.3.11 Special Bus Cycles 


The Intel486 processor provides special bus cycles to indicate that certain instructions have been 
executed, or certain conditions have occurred internally. The special bus cycles are identified by 
the status of the pins shown in Table 10-9. 


During these cycles the address bus is driven low while the data bus is undefined. 


Two of the special cycles indicate halt or shutdown. Another special cycle is generated when the 
Intel486 processor executes an INVD (invalidate data cache) instruction and could be used to 
flush an external cache. The Write Back cycle is generated when the Intel486 processor executes 
the WBINVD (write-back invalidate data cache) instruction and could be used to synchronize an 
external write-back cache. 


The external hardware must acknowledge these special bus cycles by asserting RDY# or 
BRDY#. 


10.3.11.1 HALT Indication Cycle 


The Intel486 processor halts as a result of executing a HALT instruction. A HALT indication cy- 
cle is performed to signal that the processor has entered into the HALT state. The HALT indica- 
tion cycle is identified by the bus definition signals in special bus cycle state and by a byte address 
of 2. BEO# and BE2# are the only signals that distinguish HALT indication from shutdown indi- 
cation, which drives an address of 0. During the HALT cycle, undefined data is driven on 
D[31:0]. The HALT indication cycle must be acknowledged by RDY# asserted. 


A halted Intel486 processor resumes execution when INTR (if interrupts are enabled), NMI, or 
RESET is asserted. 


10.3.11.2 Shutdown Indication Cycle 


The Intel486 processor shuts down as a result of a protection fault while attempting to process a 
double fault. A shutdown indication cycle is performed to indicate that the processor has entered 
a shutdown state. The shutdown indication cycle is identified by the bus definition signals in spe- 
cial bus cycle state and a byte address of 0. 


10.3.11.3 Stop Grant Indication Cycle 


A special Stop Grant bus cycle is driven to the bus after the processor recognizes the STPCLK# 
interrupt. The definition of this bus cycle is the same as the HALT cycle definition for the 
Intel486 processor, with the exception that the Stop Grant bus cycle drives the value 0000 0010H 
on the address pins. The system hardware must acknowledge this cycle by asserting RDY# or 
BRDY#. The processor does not enter the Stop Grant state until either RD Y# or BRD Y# has been 
asserted. (See Figure 10-32.) 


The Stop Grant Bus Cycle is defined as follows: 


M/IO# = 0, D/C# = 0, W/R# = 1, Address Bus = 0000 0010H (A4 = 1), BE[3:0]# = 1011, Data 
bus = undefined. 
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The latency between a STPCLK# request and the Stop Grant bus cycle is dependent on the cur- 
rent instruction, the amount of data in the processor write buffers, and the system memory per- 
formance. 
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Table 10-9. Special Bus Cycle Encoding 


Cycle Name M/lO# D/C# W/R# BE[3:0]# A4-A2 
Write-Back* 0 0 1 0111 000 
First Flush Ack Cyclet 0 0 1 0111 001 
Flush* 0 0 1 1101 000 
Second Flush Ack Cycle* 0 0 1 1101 001 
Shutdown 0 0 1 1110 000 
HALT 0 0 1 1011 000 
Stop Grant Ack Cycle 0 0 1 1011 100 


* These cycles are specific to the Write-Back Enhanced IntelIDX4™ processor. The FLUSH# cycle is 
applicable to all Intel486™ processors. See appropriate sections for details. 
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Figure 10-32. Stop Grant Bus Cycle 


A4401-01 
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10.3.12 Bus Cycle Restart 


In a multi-master system, another bus master may require the use of the bus to enable the Intel486 
processor to complete its current bus request. In this situation, the Intel486 processor must restart 
its bus cycle after the other bus master has completed its bus transaction. 


A bus cycle may be restarted if the external system asserts the backoff (BOFF#) input. The 
Intel486 processor samples the BOFF# pin every clock cycle. When BOFF# is asserted, the 
Intel486 processor floats its address, data, and status pins in the next clock (see Figures 10-33 and 
10-34). Any bus cycle in progress when BOFF# is asserted is aborted and any data returned to 
the processor is ignored. The pins that are floated in response to BOFF# are the same as those that 
are floated in response to HOLD. HLDA is not generated in response to BOFF#. BOFF# has 
higher priority than RDY# or BRD Y#. If either RDY# or BRDY# are asserted in the same clock 
as BOFF#, BOFF# takes effect. 


CLK 


ADS# 


A31—-A2 
M/lO# 
D/C# 
BE3#-BE0# 


RDY# 
BRDY# 
KEN# 
BOFF# 
BLAST# 


DATA 


+ To Processor 


242202-097 


Figure 10-33. Restarted Read Cycle 
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Figure 10-34. Restarted Write Cycle 


The device asserting BOFF# is free to run cycles while the Intel486 processor bus is in its high 
impedance state. If backoff is requested after the Intel486 processor has started a cycle, the new 
master should wait for memory to assert RDY# or BRDY# before assuming control of the bus. 
Waiting for RDY# or BRDY# provides a handshake to ensure that the memory system is ready 
to accept a new cycle. If the bus is idle when BOFF# is asserted, the new master can start its cycle 
two clocks after issuing BOFF#. 


The external memory can view BOFF# in the same manner as BLAST#. Asserting BOFF# tells 
the external memory system that the current cycle is the last cycle in a transfer. 


The bus remains in the high impedance state until BOFF# is deasserted. Upon negation, the 
Intel486 processor restarts its bus cycle by driving out the address and status and asserting ADS#. 
The bus cycle then continues as usual. 


Asserting BOFF# during a burst, BS8#, or BS16# cycle forces the Intel486 processor to ignore 
data returned for that cycle only. Data from previous cycles is still valid. For example, if BOFF# 
is asserted on the third BRDY# of a burst, the Intel486 processor assumes the data returned with 
the first and second BRDY# is correct and restarts the burst beginning with the third item. The 
same rule applies to transfers broken into multiple cycles by BS8# or BS16#. 
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Asserting BOFF# in the same clock as ADS# causes the Intel486 processor to float its bus in the 
next clock and leave ADS# floating low. Because ADS# is floating low, a peripheral may think 
that a new bus cycle has begun even though the cycle was aborted. There are two possible solu- 
tions to this problem. The first is to have all devices recognize this condition and ignore ADS# 
until RDY# is asserted. The second approach is to use a “two clock” backoff: in the first clock 
AHOLD is asserted, and in the second clock BOFF# is asserted. This guarantees that ADS# is 
not floating low. This is necessary only in systems where BOFF# may be asserted in the same 
clock as ADS#. 


10.3.13 Bus States 


A bus state diagram is shown in Figure 10-35. A description of the signals used in the diagram is 
given in Table 10-10. 


(RDY# Asserted + (BRDY# - BLAST#) Asserted) - 
(HOLD + AHOLD + No Request) - BOFF# Deasserted 


Request Pending : (RDY# Asserted + 
(BRDY# - BLAST#) Asserted) - 
HOLD Deasserted : AHOLD Deasserted - BOFF# Deasserted 


Request Pending - 
HOLD Deasserted - 


au AHOLD Deasserted - 


BOFF# Deasserted 


BOFF# 
Asserted 


BOFF# 
Deasserted 


AHOLD Deasserted - 
BOFF# Deasserted - 
(HOLD) Deasserted* 


* HOLD is only factored into this state transition if T, was 240950069 
entered while a non-cacheable. non-burst, code prefetch was 
in progress. Otherwise, ignore HOLD. 


Figure 10-35. Bus State Diagram 
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Table 10-10. Bus State Description 


State Means 


Ti | Bus is idle. Address and status signals may be driven to undefined values, or the bus may be floated 
to a high impedance state. 


T1 | First clock cycle of a bus cycle. Valid address and status are driven and ADS# is asserted. 


T2 | Second and subsequent clock cycles of a bus cycle. Data is driven if the cycle is a write, or data is 
expected if the cycle is a read. RDY# and BRDY# are sampled. 


T1p | First clock cycle of a restarted bus cycle. Valid address and status are driven and ADS# is asserted. 


Ty | Second and subsequent clock cycles of an aborted bus cycle. 


10.3.14 Floating-Point Error Handling for the IntelDX2™ and IntelDX4™ 
Processors 


The IntelDX2 and IntelDX4 processors provide two options for reporting floating-point errors. 
The simplest method is to raise interrupt 16 whenever an unmasked floating-point error occurs. 
This option may be enabled by setting the NE bit in control register 0 (CRO). 


The IntelDX2 and IntelDX4 processors also provide the option of allowing external hardware to 
determine how floating-point errors are reported. This option is necessary for compatibility with 
the error reporting scheme used in DOS-based systems. The NE bit must be cleared in CRO to 
enable user-defined error reporting. User-defined error reporting is the default condition because 
the NE bit is cleared on reset. 


Two pins, floating-point error (FERR#, an output) and ignore numeric error (IGNNE#, an input) 
are provided to direct the actions of hardware if user-defined error reporting is used. The 
IntelDX2 and IntelDX4 processors assert the FERR# output to indicate that a floating-point error 
has occurred. FERR# corresponds to the ERROR# pin on the Intel387™ math coprocessor. How- 
ever, there is a difference in the behavior of the two. 


In some cases FERR# is asserted when the next floating-point instruction is encountered, and in 
other cases it is asserted before the next floating-point instruction is encountered, depending upon 
the execution state of the instruction causing the exception. 
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10.3.14.1 Floating-Point Exceptions 


The following class of floating-point exceptions drive FERR# at the time the exception occurs 
(i.e., before encountering the next floating-point instruction). 


1. The stack fault, invalid operation, and denormal exceptions on all transcendental 
instructions, integer arithmetic instructions, FSQRT, FSEALE, FPREM(1), FXTRACT, 
FBLD, and FBSTP. 


2. Any exceptions on store instructions (including integer store instructions). 


The following class of floating-point exceptions drive FERR# only after encountering the next 
floating-point instruction. 


3. Exceptions other than on all transcendental instructions, integer arithmetic instructions, 
FSQRT, FSCALE, FPREM(1), FXTRACT, FBLD, and FBSTP. 


4. Any exception on all basic arithmetic, load, compare, and control instructions (i.e., all 
other instructions). 


For both sets of exceptions above, the Intel387 math coprocessor asserts ERROR# when the error 
occurs and does not wait for the next floating-point instruction to be encountered. 


IGNNE+# is an input to the IntelDX2 and IntelD X4 processors. When the NE bit in CRO is cleared, 
and IGNNE+# is asserted, the IntelD X2 and IntelD X4 processors ignore user floating-point errors 
and continue executing floating-point instructions. When IGNNE# is deasserted, the IGNNE# is 
an input to these processors that freeze on floating-point instructions that get errors (except for 
the control instructions FNCLEX, FNINIT, FNSAVE, FNSTENV, FNSTCW, FNSTSW, 
FNSTSW AX, FNENI, FNDISI and FNSETPM). IGNNE# may be asynchronous to the IntelDX2 
and IntelDX4 processor clock. 


In systems with user-defined error reporting, the FERR# pin is connected to the interrupt control- 
ler. When an unmasked floating-point error occurs, an interrupt is raised. If IGNNE# is high at 
the time of this interrupt, the IntelDX2 and IntelDX4 processors freeze (disallowing execution of 
a subsequent floating-point instruction) until the interrupt handler is invoked. By driving the 
IGNNE# pin low (when clearing the interrupt request), the interrupt handler can allow execution 
of a floating-point instruction, within the interrupt handler, before the error condition is cleared 
(by FNCLEX, FNINIT, FNSAVE or FNSTENV). If execution of a non-control floating-point in- 
struction, within the floating-point interrupt handler, is not needed, the IGNNE# pin can be tied 
high. 
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10.3.15 IntelDX2™ and IntelDX4™ Processors Floating-Point Error Handling in 
AT-Compatible Systems 


The IntelDX2 and IntelDX4 processors provide special features to allow the implementation of 
an AT-compatible numerics error reporting scheme. These features DO NOT replace the external 
circuit. Logic is still required that decodes the OUT FO instruction and latches the FERR# signal. 
The use of these Intel Processor features is described below. 


¢ The NE bit in the Machine Status Register 
¢ The IGNNE# pin 
¢ The FERR# pin 


The NE bit determines the action taken by the IntelDX2 and IntelDX4 processors when a numer- 
ics error is detected. When set, this bit signals that non-DOS compatible error handling is imple- 
mented. In this mode the IntelDX2 and IntelDX4 processors take a software exception (16) if a 
numerics error is detected. 


If the NE bit is reset, the IntelDX2 and IntelDX4 processors use the IGNNE# pin to allow an ex- 
ternal circuit to control the time at which non-control numerics instructions are allowed to exe- 
cute. Note that floating-point control instructions such as FNINIT and FNSAVE can be executed 
during a floating-point error condition regardless of the state of IGNNE#. 


To process a floating-point error in the DOS environment, the following sequence must take 
place: 


1. The error is detected by the IntelDX2 and IntelDX4 processor that activates the FERR# 
pin. 


2. FERR# is latched so that it can be cleared by the OUT FO instruction. 


3. The latched FERR# signal activates an interrupt at the interrupt controller. This interrupt 
is usually handled on IRQ13. 


4. The Interrupt Service Routine (ISR) handles the error and then clears the interrupt by 
executing an OUT instruction to port FO. The address FO is decoded externally to clear the 
FERR# latch. The IGNNE# signal is also activated by the decoder output. 


5. Usually the ISR then executes an FNINIT instruction or other control instruction before 
restarting the program. FNINIT clears the FERR# output. 


Figure 10-36 illustrates a sample circuit that performs the function described above. Note that this 
circuit has not been tested and is included as an example of required error handling logic. 


Note that the IGNNE# input allows non-control instructions to be executed prior to the time the 
FERR# signal is reset by the IntelIDX2 and IntelDX4 processors. This function is implemented 
to allow exact compatibility with the AT implementation. Most programs re-initialize the Float- 
ing-Point Unit (FPU) before continuing after an error is detected. The FPU can be re-initialized 
using one of the following four instructions: FCLEX, FINIT, FSAVE and FSTENV. 
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Figure 10-36. DOS-Compatible Numerics Error Circuit 
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10.4 ENHANCED BUS MODE OPERATION FOR THE WRITE-BACK ENHANCED 
IntelDX4™ PROCESSOR 


All Intel486™ processors operate in Standard Bus (write-through) mode. However, when the in- 
ternal cache of the Write-Back Enhanced IntelDX4 processor is configured in write-back mode, 
the processor bus operates in the Enhanced Bus mode. This section describes how the Write-Back 
Enhanced Intel486 processor bus operation changes for the Enhanced Bus mode when the inter- 
nal cache is configured in write-back mode. 


10.4.1: Summary of Bus Differences 


The following is a list of the differences between the Enhanced Bus and Standard Bus modes. In 
Enhanced Bus mode: 


1. Burst write capability is extended to four doubleword burst cycles (for write-back cycles 
only). 


2. Four new signals: INV, WB/WT#, HITM#, and CACHE#, have been added to support the 
write-back operation of the internal cache. These signals function the same as the 
equivalent signals on the Pentium® OverDrive® processor pins. 


3. The SRESET signal has been modified so that it does not write back, invalidate, or disable 
the cache. Special test modes are also not initiated through SRESET. 


4. The FLUSH# signal behaves the same as the WBINVD instruction. Upon assertion, 
FLUSH# writes back all modified lines, invalidates the cache, and issues two special bus 
cycles. 


5. The PLOCK# signal remains deasserted. 


10.4.2 Burst Cycles 


Figure 10-37 shows a basic burst read cycle of the Write-Back Enhanced IntelDX4 processor. In 
the Enhanced Bus mode, both PCD and CACHE# are asserted if the cycle is internally cacheable. 
The Write-Back Enhanced IntelDX4 processor samples KEN# in the clock before the first 
BRDY#. If KEN# is asserted by the system, this cycle is transformed into a multiple-transfer cy- 
cle. With each data item returned from external memory, the data is “cached” only if KEN# is 
asserted again in the clock before the last BRD Y# signal. Data is sampled only in the clock in 
which BRDY# is asserted. If the data is not sent to the processor every clock, it causes a “slow 
burst” cycle. 
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Figure 10-37. Basic Burst Read Cycle 
10.4.2.1 Non-Cacheable Burst Operation 


When CACHE# is asserted on a read cycle, the processor follows with BLAST# high when 
KEN# is asserted. However, the converse is not true. The Write-Back Enhanced IntelDX4 pro- 
cessor may elect to read burst data that are identified as non-cacheable by either CACHE# or 
KEN+#. In this case, BLAST# is also high in the same cycle as the first BRD Y# (in clock four). 
To improve performance, the memory controller should try to complete the cycle as a burst cycle. 


The assertion of CACHE# on a write cycle signifies a replacement or snoop write-back cycle. 
These cycles consist of four doubleword transfers (either bursts or non-burst). The signals KEN# 
and WB/WT# are not sampled during write-back cycles because the processor does not attempt 
to redefine the cacheability of the line. 
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10.4.2.2 Burst Cycle Signal Protocol 


The signals from ADS# through BLAST#, which are shown in Figure 10-37, have the same func- 
tion and timing in both Standard Bus and Enhanced Bus modes. Burst cycles can be up to 16- 
bytes long (four aligned doublewords) and can start with any one of the four doublewords. The 
sequence of the addresses is determined by the first address and the sequence follows the order 
shown in Table 10-8 on page 10-28. The burst order for reads is the same as the burst order for 
writes. (See Section 10.3.4.2, “Burst and Cache Line Fill Order.’’) 


An attempted line fill caused by a read miss is indicated by the assertion of CACHE# and W/R#. 
For a line fill to occur, the system must assert KEN# twice: one clock prior to the first BRDY# 
and one clock prior to last BRDY#. It takes only one deassertion of KEN# to mark the line as 
non-cacheable. A write-back cycle of a cache line, due to replacement or snoop, is indicated by 
the assertion of CACHE# low and W/R# high. KEN# has no effect during write-back cycles. 
CACHE+# is valid from the assertion of ADS# through the clock in which the first RDY# or 
BRDY# is asserted. CACHE# is deasserted at all other times. PCD behaves the same in Enhanced 
Bus mode as in Standard Bus mode, except that it is low during write-back cycles. 


The Write-Back Enhanced IntelD X4 processor samples WB/WT# once, in the same clock as the 
first BRDY#. This sampled value of WB/WT# is combined with PWT to bring the line into the 
internal cache, either as a write-back line or write-through line. 


10.4.3 Cache Consistency Cycles 


The system performs snooping to maintain cache consistency. Snoop cycles can be performed 
under AHOLD, BOFF#, or HOLD, as described in Table 10-11. 


Table 10-11. Snoop Cycles under AHOLD, BOFF#, or HOLD 


Floats the address bus. ADS# is asserted under AHOLD only to initiate a snoop write-back cycle. 
An ongoing burst cycle is completed under AHOLD. For non-burst cycles, a specific non-burst 
transfer (ADS#-RDY# transfer) is completed under AHOLD and fractured before the next 
assertion of ADS#. A snoop write-back cycle is reordered ahead of a fractured non-burst cycle 
and the non-burst cycle is completed only after the snoop write-back cycle is completed, 
provided there are no other snoop write-back cycles scheduled. 


AHOLD 


Overrides AHOLD and takes effect in the next clock. On-going bus cycles will stop in the clock 
BOFF# | following the assertion of BOFF# and resume when BOFF# is de-asserted. The snoop write-back 
cycle begins after BOFF# is de-asserted followed by the backed-off cycle. 


HOLD is acknowledged only between bus cycles, except for a non-cacheable, non-burst code 
HOLD prefetch cycle. In a non-cacheable, non-burst code prefetch cycle, HOLD is acknowledged after 

the system asserts RDY#. Once HOLD is asserted, the processor blocks all bus activities until 
the system releases the bus (by de-asserting HOLD). 
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The snoop cycle begins by checking whether a particular cache line has been “cached” and inval- 
idates the line based on the state of the INV pin. If the Write-Back Enhanced IntelDX4 processor 
is configured in Enhanced Bus mode, the system must drive INV high to invalidate a particular 
cache line. The Write-Back Enhanced IntelDX4 processor does not have an output pin to indicate 
a snoop hit to an S-state line or an E-state line. However, the Write-Back Enhanced IntelDX4 pro- 
cessor invalidates the line if the system snoop hits an S-state, E-state, or M-state line, provided 
INV was driven high during snooping. If INV is driven low during a snoop cycle, a modified line 
is written back to memory and remains in the cache as a write-back line; a write-through line also 
remains in the cache as a write-through line. 


After asserting AHOLD or BOFF#, the external bus master driving the snoop cycle must wait for 
two clocks before driving the snoop address and asserting EADS#. If snooping is done under 
HOLD, the master performing the snoop must wait for at least one clock cycle before driving the 
snoop addresses and asserting EADS#. INV should be driven low during read operations to min- 
imize invalidations, and INV should be driven high to invalidate a cache line during write oper- 
ations. The Write-Back Enhanced IntelDX4 processor asserts HITM# if the cycle hits a modified 
line in the cache. This output signal becomes valid two clock periods after EADS# is valid on the 
bus. HITM# remains asserted until the modified line is written back and remains asserted until 
the RDY# or BRD Y# of the snoop cycle is asserted. Snoop operations could interrupt an ongoing 
bus operation in both the Standard Bus and Enhanced Bus modes. The Write-Back Enhanced 
IntelD X4 processor can accept EADS# in every clock period while in Standard Bus mode. In En- 
hanced Bus mode, the Write-Back Enhanced IntelDX4 processor can accept EADS# every other 
clock period or until a snoop hits an M-state line. The Write-Back Enhanced IntelDX4 processor 
does not accept any further snoop cycles inputs until the previous snoop write-back operation is 
completed. 


All write-back cycles adhere to the burst address sequence of 0-4-8-C. The CACHE#, PWT, and 
PCD output pins are asserted and the KEN# and WB/WT# input pins are ignored. Write-back cy- 
cles can be either burst or non-burst. All write-back operations write 16 bytes of data to memory 
corresponding to the modified line that hit during the snoop. 


NOTE 
Note that the Write-Back Enhanced IntelDX4 processor accepts BS8# and 
BS 16# line-fill cycles, but not on replacement or snoop-forced write-back 
cycles. 
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Snoop Collision with a Current Cache Line Operation 


BUS OPERATION 


The system can also perform snooping concurrent with a cache access and may collide with a cur- 
rent cache bus cycle. Table 10-12 lists some scenarios and the results of a snoop operation col- 
liding with an on-going cache fill or replacement cycle. 


Table 10-12. Various Scenarios of a Snoop Write-Back Cycle Colliding with an On-Going 
Cache Fill or Replacement Cycle 


Arbi- Snoop to the Line Snoop to a Different Snoop to the Line Snoop to a Different 
tration That Is Being Filled Line than the Line That Is Being Line than the Line 
Control Being Filled Replaced Being Replaced 
AHOLD | Read all line fill data | Complete fill if the Complete replacement | Complete replacement 

into cache line buffer. | cycle is burst. Start write-back if the cycle | write-back if it is a burst 
Update cache only if | S°0P write-back. is burst. Processor cycle. Initiate snoop 
snoop occurred with | If the cycle is non- does not initiate a write-back. 
INV =0 burst, the snoop write- | S.00p write-back, but | if the replacement write- 
No write-back cycle | back is reordered ae HITMé# until back is a non-burst cycle, 
because the line has | ahead of the line fill. | the replacement write | the snoop write-back 
Pe , back is completed. cycle is re-ordered in 
not been modified After the snoop write- 
yet. back cycle is If the replacement front of the replacement 
completed, continue cycle is non-burst, the cycle. After the snoop 
with line fill. snoop write-back is re- | write-back, the 
ordered ahead of the replacement write-back 
replacement write- is continued from the 
back cycle. The interrupt point. 
processor does not 
continue with the 
replacement write- 
back cycle. 

BOFF# | Stop reading line fill | Stop fill Stop replacement Stop replacement write- 
data Wait for BOFF# to be write-back back 
Wait for BOFF# to be | deasserted. Wait for BOFF# to be =| Wait for BOFF# to be de- 
deasserted. Do snoop write-back deasserted. asserted 
enone Continue fill from Initiate snoop write- Initiate snoop write-back 

__ [interrupt point. back Continue replacement 
Update cache only if Processor does not write-back from point of 
Snoop occurred with continue replacement _| interrupt. 
INV = 0". write-back. 

HOLD |HOLD is not acknowledged until the current bus cycle (i.e., the line operation) is completed, 
except for a non-cacheable, non-burst code prefetch cycle. Consequently there can be no 
collision with the snoop cycles using HOLD, except as mentioned earlier. In this case the snoop 
write-back is re-ordered ahead of an on-going non-burst, non-cached code prefetch cycle. After 
the write-back cycle is completed, the code prefetch cycle continues from the point of interrupt. 
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10.4.3.2 Snoop under AHOLD 


Snooping under AHOLD begins by asserting AHOLD to force the Write-Back Enhanced 
IntelD X4 processor to float the address bus, as shown in Figure 10-38. The ADS# for the write- 
back cycle is guaranteed to occur no sooner than the second clock following the assertion of 
HITM# (i.e., there is a dead clock between the assertion of HITM# and the first ADS# of the 
snoop write-back cycle). 


When a line is written back, KEN#, WB/WT#, BS8#, and BS16# are ignored, and PWT and PCD 
are always low during write-back cycles. 
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Figure 10-38. Snoop Cycle Invalidating a Modified Line 
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The next ADS# for a new cycle can occur immediately after the last RDY# or BRDY# of the 
write-back cycle. The Write-Back Enhanced IntelDX4 processor does not guarantee a dead clock 
between cycles unless the second cycle is a snoop-forced write-back cycle. This allows snoop- 
forced write-backs to be backed off (BOFF#) when snooping under AHOLD. 


HITM# is guaranteed to remain asserted until the RDY# or BRDY# signals corresponding to the 
last doubleword of the write-back cycle is returned. HITM# is de-asserted from the clock edge in 
which the last BRDY# or RDY# for the snoop write-back cycle is asserted. The write-back cycle 
could be a burst or non-burst cycle. In either case, 16 bytes of data corresponding to the modified 
line that has a snoop hit is written back. 


Snoop under AHOLD Overlaying a Line-Fill Cycle 


The assertion of AHOLD during a line fill is allowed on the Write-Back Enhanced IntelDX4 pro- 
cessor. In this case, when a snoop cycle is overlaid by an on-going line-fill cycle, the chipset must 
generate the burst addresses internally for the line fill to complete, because the address bus has 
the valid snoop address. The write-back mode is more complex compared to the write-through 
mode because of the possibility of a line being written back. Figure 10-39 shows a snoop cycle 
overlaying a line-fill cycle, when the snooped line is not the same as the line being filled. 


In Figure 10-39, the snoop to an M-state line causes a snoop write-back cycle. The Write-Back 
Enhanced IntelDX4 processor asserts HITM# two clocks after the EADS#, but delays the snoop 
write-back cycle until the line fill is completed, because the line fill shown in Figure 10-39 is a 
burst cycle. In this figure, AHOLD is asserted one clock after ADS#. In the clock after AHOLD 
is asserted, the Write-Back Enhanced IntelDX4 processor floats the address bus (not the Byte En- 
ables). Hence, the memory controller must determine burst addresses in this period. The chipset 
must comprehend the special ordering required by all burst sequences of the Write-Back En- 
hanced IntelDX4 processor. HITM# is guaranteed to remain asserted until the write-back cycle 
completes. 


If AHOLD continues to be asserted over the forced write-back cycle, the memory controller also 
must supply the write-back addresses to the memory. The Write-Back Enhanced IntelDX4 pro- 
cessor always runs the write-back with an address sequence of 0-4-8-C. 


In general, if the snoop cycle overlays any burst cycle (not necessarily a line-fill cycle) the snoop 
write-back is delayed because of the on-going burst cycle. First, the burst cycle goes to comple- 
tion and only then does the snoop write-back cycle start. 
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Figure 10-39. Snoop Cycle Overlaying a Line-Fill Cycle 
AHOLD Snoop Overlaying a Non-Burst Cycle 


When AHOLD overlays a non-burst cycle, snooping is based on the completion of the current 
non-burst transfer (ADS#-RDY# transfer). Figure 10-40 shows a snoop cycle under AHOLD 
overlaying a non-burst line-fill cycle. HITM# is asserted two clocks after EADS#, and the non- 
burst cycle is fractured after the RD Y# for a specific single transfer is asserted. The snoop write- 
back cycle is re-ordered ahead of an ongoing non-burst cycle. After the write-back cycle is com- 
pleted, the fractured non-burst cycle continues. The snoop write-back ALWAYS precedes the 
completion of a fractured cycle, regardless of the point at which AHOLD is de-asserted, and 
AHOLD must be de-asserted before the fractured non-burst cycle can complete. 
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Figure 10-40. Snoop Cycle Overlaying a Non-Burst Cycle 
AHOLD Snoop to the Same Line that is being Filled 


A system snoop does not cause a write-back cycle to occur if the snoop hits a line while the line 
is being filled. The processor does not allow a line to be modified until the fill is completed (and 
a snoop only produces a write-back cycle for a modified line). Although a snoop to a line that is 
being filled does not produce a write-back cycle, the snoop still has an effect based on the follow- 
ing rules: 


1. The processor always snoops the line being filled. 


2. Inall cases, the processor uses the operand that triggered the line fill. 
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3. Ifthe snoop occurs when INV = “1”, the processor never updates the cache with the fill 
data. 


4. Ifthe snoop occurs when INV = “0”, the processor loads the line into the internal cache. 


10.4.3.3. Snoop During Replacement Write-Back 


If the cache contains valid data during a line fill, one of the cache lines may be replaced as deter- 
mined by the Least Recently Used (LRU) algorithm. Refer to Chapter 7, “On-Chip Cache” for a 
detailed discussion of the LRU algorithm. If the line being replaced is modified, this line is writ- 
ten back to maintain cache coherency. When a replacement write-back cycle is in progress, it 
might be necessary to snoop the line that is being written back (see Figure 10-41). 
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Figure 10-41. Snoop to the Line that is Being Replaced 
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If the replacement write-back cycle is burst and there is a snoop hit to the same line as the line 
that is being replaced, the on-going replacement cycle runs to completion. HITM# is asserted un- 
til the line is written back and the snoop write-back is not initiated. In this case, the replacement 
write-back is converted to the snoop write-back, and HITM# is asserted and de-asserted without 
a specific ADS# to initiate the write-back cycle. 


If there is a snoop hit to a different line from the line being replaced, and if the replacement write- 
back cycle is burst, the replacement cycle goes to completion. Only then is the snoop write-back 
cycle initiated. 
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If the replacement write-back cycle is a non-burst cycle, and if there is a snoop hit to the same 
line as the line being replaced, it fractures the replacement write-back cycle after RD Y# is assert- 
ed for the current non-burst transfer. The snoop write-back cycle is reordered in front of the frac- 
tured replacement write-back cycle and is completed under HITM#. However, after AHOLD is 
deasserted, the replacement write-back cycle is not completed. 


If there is a snoop hit to a line that is different from the one being replaced, the non-burst replace- 
ment write-back cycle is fractured, and the snoop write-back cycle is reordered ahead of the re- 
placement write-back cycle. After the snoop write-back is completed, the replacement write-back 
cycle continues. 


10.4.3.4 Snoop under BOFF# 


BOFF# is capable of fracturing any transfer, burst or non-burst. The output pins (see Table 10-8 
and Table 10-12) of the Write-Back Enhanced IntelDX4 processor are floated in the clock period 
following the assertion of BOFF#. If the system snoop hits a modified line using BOFF#, the 
snoop write-back cycle is reordered ahead of the current cycle. BOFF# must be de-asserted for 
the processor to perform a snoop write-back cycle and resume the fractured cycle. The fractured 
cycle resumes with a new ADS# and begins with the first uncompleted transfer. Snoops are per- 
mitted under BOFF#, but write-back cycles are not started until BOFF# is de-asserted. Conse- 
quently, multiple snoop cycles can occur under a continuously asserted BOFF#, but only up to 
the first asserted HITM#. 


Snoop under BOFF# during Cache Line Fill 


As shown in Figure 10-42, BOFF# fractures the second transfer of a non-burst cache line-fill cy- 
cle. The system begins snooping by driving EADS# and INV in clock six. The assertion of 
HITM# in clock eight indicates that the snoop cycle hit a modified line and the cache line is writ- 
ten back to memory. The assertion of HITM# in clock eight and CACHE# and ADS# in clock ten 
identifies the beginning of the snoop write-back cycle. ADS# is guaranteed to be asserted no 
sooner than two clock periods after the assertion of HITM#. Write-back cycles always use the 
four-doubleword address sequence of 0-4-8-C (burst or non-burst). The snoop write-back cycle 
begins upon the de-assertion of BOFF# with HITM# asserted throughout the duration of the 
snoop write-back cycle. 


If the snoop cycle hits a line that is different from the line being filled, the cache line fill resumes 
after the snoop write-back cycle completes, as shown in Figure 10-42. 
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Figure 10-42. Snoop under BOFF# during a Cache Line-Fill Cycle 


An ADS# is always issued when a cycle resumes after being fractured by BOFF#. The address 
of the fractured data transfer is reissued under this ADS#, and CACHE# is not issued unless the 
fractured operation resumes from the first transfer (e.g., first doubleword). If the system asserts 
BOFF# and RDY# simultaneously, as shown in clock four on Figure 10-42, BOFF# dominates 
and RDY# is ignored. Consequently, the Write-Back Enhanced IntelDX4 processor accepts only 
up to the x4h doubleword, and the line fill resumes with the xOh doubleword. ADS# initiates the 
resumption of the line-fill operation in clock period 15. HITM# is de-asserted in the clock period 
following the clock period in which the last RDY# or BRD Y# of the write-back cycle is asserted. 
Hence, HITM# is guaranteed to be de-asserted before the ADS# of the next cycle. 
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Figure 10-42 also shows the system asserting RDY# to indicate a non-burst line-fill cycle. Burst 
cache line-fill cycles behave similarly to non-burst cache line-fill cycles when snooping using 
BOFF#. If the system snoop hits the same line as the line being filled (burst or non-burst), the 
Write-Back Enhanced IntelDX4 processor does not assert HITM# and does not issue a snoop 
write-back cycle, because the line was not modified, and the line fill resumes upon the de-asser- 
tion of BOFF#. However, the line fill is cached only if INV is driven low during the snoop cycle. 


Snoop under BOFF# during Replacement Write-Back 


If the system snoop under BOFF# hits the line that is currently being replaced (burst or non- 
burst), the entire line is written back as a snoop write-back line, and the replacement write-back 
cycle is not continued. However, if the system snoop hits a different line than the one currently 
being replaced, the replacement write-back cycle continues after the snoop write-back cycle has 
been completed. Figure 10-43 shows a system snoop hit to the same line as the one being replaced 
(non-burst). 
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Figure 10-43. Snoop under BOFF# to the Line that is Being Replaced 
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10.4.3.5 | Snoop under HOLD 


HOLD can only fracture a non-cacheable, non-burst code prefetch cycle. For all other cycles, the 
Write-Back Enhanced IntelDX4 processor does not assert HLDA until the entire current cycle is 
completed. If the system snoop hits a modified line under HLDA during a non-cacheable, non- 
burstable code prefetch, the snoop write-back cycle is reordered ahead of the fractured cycle. The 
fractured non-cacheable, non-burst code prefetch resumes with an ADS# and begins with the first 
uncompleted transfer. Snoops are permitted under HLDA, but write-back cycles do not occur un- 
til HOLD is de-asserted. Consequently, multiple snoop cycles are permitted under a continuously 
asserted HLDA only up to the first asserted HITM#. 


Snoop under HOLD during Cache Line Fill 


As shown in Figure 10-44, HOLD (asserted in clock two) does not fracture the burst cache line- 
fill cycle until the line fill is completed (in clock five). Upon completing the line fill in clock five, 
the Write-Back Enhanced IntelDX4 processor asserts HLDA and the system begins snooping by 
driving EADS# and INV in the following clock period. The assertion of HITM# in clock nine 
indicates that the snoop cycle has hit a modified line and the cache line is written back to memory. 
The assertion of HITM# in clock nine and CACHE# and ADS# in clock 11 identifies the begin- 
ning of the snoop write-back cycle. The snoop write-back cycle begins upon the de-assertion of 
HOLD, and HITM# is asserted throughout the duration of the snoop write-back cycle. 
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Figure 10-44. Snoop under HOLD during Line Fill 


If HOLD is asserted during a non-cacheable, non-burst code prefetch cycle, as shown in 
Figure 10-45, the Write-Back Enhanced IntelDX4 processor issues HLDA in clock seven (which 
is the clock period in which the next RDY# is asserted). If the system snoop hits a modified line, 
the snoop write-back cycle begins after HOLD is released. After the snoop write-back cycle is 
completed, an ADS# is issued and the code prefetch cycle resumes. 
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Figure 10-45. Snoop using HOLD during a Non-Cacheable, Non-Burstable Code Prefetch 


10.4.3.6 Snoop under HOLD during Replacement Write-Back 


Collision of snoop cycles under a HOLD during the replacement write-back cycle can never oc- 
cur, because HLDA is asserted only after the replacement write-back cycle (burst or non-burst) 
is completed. 
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10.4.4 Locked Cycles 


In both Standard and Enhanced Bus modes, the Write-Back Enhanced IntelDX4 processor archi- 
tecture supports atomic memory access. A programmer can modify the contents of a memory 
variable and be assured that the variable is not accessed by another bus master between the read 
of the variable and the update of that variable. This function is provided for instructions that con- 
tain a LOCK prefix, and also for instructions that implicitly perform locked read modify write 
cycles. In hardware, the LOCK function is implemented through the LOCK# pin, which indicates 
to the system that the processor is performing this sequence of cycles, and that the processor 
should be allowed atomic access for the location accessed during the first locked cycle. 


A locked operation is a combination of one or more read cycles followed by one or more write 
cycles with the LOCK# pin asserted. Before a locked read cycle is run, the processor first deter- 
mines if the corresponding line is in the cache. If the line is present in the cache, and is in an E or 
S state, it is invalidated. If the line is in the M state, the processor does a write-back and then in- 
validates the line. A locked cycle to an M, S, or E state line is always forced out to the bus. If the 
operand is misaligned across cache lines, the processor could potentially run two write back cy- 
cles before starting the first locked read. In this case the sequence of bus cycles is: write back, 
write back, locked read, locked read, locked write and the final locked write. Note that although 
a total of six cycles are generated, the LOCK# pin is asserted only during the last four cycles, as 
shown in Figure 10-46. 


LOCK# is not deasserted if AHOLD is asserted in the middle of a locked cycle. LOCK# remains 
asserted even if there is a snoop write-back during a locked cycle. LOCK# is floated if BOFF# is 
asserted in the middle of a locked cycle. However, it is driven LOW again when the cycle restarts 
after BOFF#. Locked read cycles are never transformed into line fills, even if KEN# is asserted. 
If there are back-to-back locked cycles, the Write-Back Enhanced IntelDX4 processor does not 
insert a dead clock between these two cycles. HOLD is recognized if there are two back-to-back 
locked cycles, and LOCK# floats when HLDA is asserted. 
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Figure 10-46. Locked Cycles (Back-to-Back) 
10.4.4.1 Snoop/Lock Collision 


If there is a snoop cycle overlaying a locked cycle, the snoop write-back cycle fractures the 
locked cycle. As shown in Figure 10-47, after the read portion of the locked cycle is completed, 
the snoop write-back starts under HITM#. After the write-back is completed, the locked cycle 
continues. But during all this time (including the write-back cycle), the LOCK# signal remains 
asserted. 


Because HOLD is not acknowledged if LOCK# is asserted, snoop-lock collisions are restricted 
to AHOLD and BOFF# snooping. 
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Figure 10-47. Snoop Cycle Overlaying a Locked Cycle 


10.4.5 Flush Operation 


The Write-Back Enhanced IntelDX4 processor executes a flush operation when the FLUSH# pin 
is asserted, and no outstanding bus cycles, such as a line fill or write back, are being processed. 
In the Enhanced Bus mode, the processor first writes back all the modified lines to external mem- 
ory. After the write-back is completed, two special cycles are generated, indicating to the external 
system that the write-back is done. All lines in the internal cache are invalidated after all the 
write-back cycles are done. Depending on the number of modified lines in the cache, the flush 
could take a minimum of 1280 bus clocks (2560 processor clocks) and up to a maximum of 5000+ 
bus clocks to scan the cache, perform the write backs, invalidate the cache, and run the flush ac- 
knowledge cycles. FLUSH# is implemented as an interrupt in the Enhanced Bus mode, and is rec- 
ognized only on an instruction boundary. Write-back system designs should look for the flush 
acknowledge cycles to recognize the end of the flush operation. Figure 10-48 shows the flush op- 
eration of the Write-Back Enhanced IntelDX4 processor when configured in the Enhanced Bus 
mode. 
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If the processor is in Standard Bus mode, the processor does not issue special acknowledge cycles 
in response to the FLUSH# input, although the internal cache is invalidated. The invalidation of 
the cache in this case, takes only two bus clocks. 
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Figure 10-48. Flush Cycle 


10.4.6 Pseudo Locked Cycles 


In Enhanced Bus mode, PLOCK# is always deasserted for both burst and non-burst cycles. 
Hence, it is possible for other bus masters to gain control of the bus during operand transfers that 
take more than one bus cycle. A 64-bit aligned operand can be read in one burst cycle or two non- 
burst cycles if BS8# and BS16# are not asserted. Figure 10-49 shows a 64-bit floating-point op- 
erand or Segment Descriptor read cycle, which is burst by the system asserting BRDY#. 


10.4.6.1 Snoop under AHOLD during Pseudo-Locked Cycles 


AHOLD can fracture a 64-bit transfer if it is a non-burst cycle. If the 64-bit cycle is burst, as 
shown in Figure 10-49, the entire transfer goes to completion and only then does the snoop write- 
back cycle start. 
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Figure 10-49. Snoop under AHOLD Overlaying Pseudo-Locked Cycle 
10.4.6.2 | Snoop under Hold during Pseudo-Locked Cycles 


As shown in Figure 10-50, HOLD does not fracture the 64-bit burst transfer. The Write-Back En- 
hanced IntelDX4 processor does not issue HLDA until clock four. After the 64-bit transfer is 
completed, the Write-Back Enhanced IntelDX4 processor writes back the modified line to mem- 
ory (if snoop hits a modified line). If the 64-bit transfer is non-burst, the Write-Back Enhanced 
IntelD X4 processor can issue HLDA in between bus cycles for a 64-bit transfer. 
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Figure 10-50. Snoop under HOLD Overlaying Pseudo-Locked Cycle 
10.4.6.3 Snoop under BOFF# Overlaying a Pseudo-Locked Cycle 


BOFF# is capable of fracturing any bus operation. In Figure 10-51, BOFF# fractured a current 
64-bit read cycle in clock four. If there is a snoop hit under BOFF#, the snoop write-back opera- 
tion begins after BOFF# is deasserted. The 64-bit write cycle resumes after the snoop write-back 
operation completes. 
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Figure 10-51. Snoop under BOFF# Overlaying a Pseudo-Locked Cycle 
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CHAPTER 11 
DEBUGGING SUPPORT 


The Intel486™ processor provides several features that simplify the debugging process. The 
three categories of on-chip debugging aids are: 


1. Code execution breakpoint opcode (OCCH) 
2. Single-step capability provided by the TF bit in the Flag register 


3. Code and data breakpoint capability provided by the Debug Registers DR[3:0], DR6, and 
DR7 


11.1 BREAKPOINT INSTRUCTION 


A single-byte opcode breakpoint instruction is available for use by software debuggers. The 
breakpoint opcode, OCCH, generates an exception 3 trap when executed. In typical use, a debug- 
ger program “plants” the breakpoint instruction at all desired code execution breakpoints. The 
single-byte breakpoint opcode is an alias for the two-byte general software interrupt instruction 
INT n, where n=3. The only difference between INT 3 (OCCh) and INT n is that INT 3 is never 
IOPL-sensitive, whereas INT n is IOPL-sensitive in Protected Mode and Virtual 8086 Mode. 


11.2 SINGLE-STEP TRAP 


When the single-step flag (TF, bit 8) in the EFLAG register is set at the end of an instruction, a 
single-step exception occurs. The single-step exception is auto vectored to exception number 1. 
Precisely, exception | occurs as a trap after the instruction following the instruction that set TF. 
In typical practice, a debugger sets the TF bit of a flag register image on the debugger's stack. 
Typically, it then transfers control to the user program and loads the flag image with a signal in- 
struction, the IRET instruction. The single-step trap occurs after executing one instruction of the 
user program. 


Because exception | occurs as a trap (that is, it occurs after the instruction has executed), the 
CS:EIP pushed onto the debugger's stack points to the next unexecuted instruction of the program 
being debugged. Therefore, by ending with an IRET instruction, an exception | handler can effi- 
ciently support single-stepping through a user program. 
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11.3. DEBUG REGISTERS 


The Debug Registers are an advanced debugging feature of the Intel486 processor. They allow 
data access breakpoints and code execution breakpoints. Because the breakpoints are indicated 
by on-chip registers, an instruction execution breakpoint can be placed in ROM code or in code 
shared by several tasks, neither of which can be supported by the INT3 breakpoint opcode. 


The Intel486 processor contains six Debug Registers, providing the ability to specify up to four 
distinct breakpoint addresses, breakpoint control options, and read breakpoint status. Initially af- 
ter reset, breakpoints are in the disabled state. Therefore, no breakpoints occur unless the debug 
registers are programmed. Breakpoints set up in the Debug Registers are auto-vectored to excep- 
tion number 1. 


11.3.1 Linear Address Breakpoint Registers (DR[3:0]) 


Up to four breakpoint addresses can be specified by writing to Debug Registers DR[3:0], shown 
in Figure 11-1. The breakpoint addresses specified are 32-bit linear addresses. Intel486 processor 
hardware continuously compares the linear breakpoint addresses in DR[3:0] with the linear ad- 
dresses generated by executing software (a linear address is the result of computing the effective 
address and adding the 32-bit segment base address). Note that when paging is not enabled, the 
linear address equals the physical address. If paging is enabled, the linear address is translated to 
a physical 32-bit address by the on-chip paging unit. Regardless of whether paging is enabled or 
not, however, the breakpoint registers hold linear addresses. 


11.3.2 Debug Control Register (DR7) 


A Debug Control Register, DR7 shown in Figure 11-1, allows several debug control functions, 
such as enabling the breakpoints and setting up other control options for the breakpoints. The 
fields within the Debug Control Register, DR7, are as follows: 
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31 16 15 0 
Breakpoint 0 Linear DRO 
Address 
Breakpoint 1 Linear DRI 
Address 
Breakpoint 2 Linear DR2 
Address 
Breakpoint 3 Linear DR3 
Address 
Intel Reserved. Do not DR4 
define. 
Intel Reserved. Do not DR5 
define. 
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Figure 11-1. Debug Registers 


LENi (breakpoint length specification bits) 


A 2-bit LEN field exists for each of the four breakpoints. LEN specifies the length of the associ- 
ated breakpoint field. The choices for data breakpoints are: 1 byte, 2 bytes, and 4 bytes. Instruc- 
tion execution breakpoints must have a length of 1 (LENi = 00). Encoding of the LENi field is as 
described in Table 11-1. 


The LENi field controls the size of breakpoint field i by controlling whether all low-order linear 
address bits in the breakpoint address register are used to detect the breakpoint event. Therefore, 
all breakpoint fields are aligned: 2-byte breakpoint fields begin on word boundaries, and 4-byte 
breakpoint fields begin on dword boundaries. 


Figure | 1-2 is an example of various size breakpoint fields. Assume the breakpoint linear address 
in DR2 is OOOOOO00SH. In that situation, the Figure 11-2 indicates the region of the breakpoint 
field for lengths of 1, 2, or 4 bytes. 


RWi (memory access qualifier bits) 


A 2-bit RW field exists for each of the four breakpoints. The 2-bit RW field specifies the type of 
usage that must occur to activate the associated breakpoint. 
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Table 11-1. LENi Encoding 


LENi 
Encoding 


Breakpoint 
Field Width 


Usage of Least Significant Bits in Breakpoint Address 
Register i, (i=0-3) 


00 


1 byte 


All 32-bits used to specify a single-byte breakpoint field. 


01 


2 bytes 


A[31:1] used to specify a two-byte, word-aligned breakpoint 
field. AO in Breakpoint Address Register is not used. 


10 


Undefined—do not use 


this encoding 


11 


4 bytes 


DR2 = 00000005H; LEN2 = 00B 


31 


A[31:1] used to specify a four-byte, dword-aligned breakpoint 
field. AO and A1 in Breakpoint Address Register are not used. 


00000008H 


BKPT FLD2 00000004H 


00000000H 


DR2 = 00000005H; LEN2 = 01B 


00000008H 


~<— BKPT FLD2 —> 


00000004H 


31 


DR2 = 00000005H; LEN2 = 11B 


00000000H 


00000008H 


=z? — BKPT FLD2 —————_> 


Figure 11-2. Size Breakpoint Fields 
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RW encoding 00 is used to set up an instruction execution breakpoint. RW encodings 01 or 11 
are used to set up write-only or read/write data breakpoints. 


Table 11-2. RW Encoding 


RW Encoding Usage Causing Breakpoint 
00 Instruction execution only 
01 Data writes only 
10 Undefined—do not use this encoding 


11 Data reads and writes only 


Note that instruction execution breakpoints are taken as faults (i.e., before the instruction exe- 
cutes), but data breakpoints are taken as traps (i.e., after the data transfer takes place). 


Using LENi and RWi to Set Data Breakpoint i 


A data breakpoint can be set up by writing the linear address into DRi (i = 0-3). For data break- 
points, RWi can equal 01 (write-only) or 11 (write/read). LEN can equal 00, 01, or 11. 


When a data access entirely or partly falls within the data breakpoint field, the data breakpoint 
condition has occurred, and if the breakpoint is enabled, an exception | trap occurs. 


Using LENi and RWi to Set Instruction Execution Breakpoint i 


An instruction execution breakpoint can be set up by writing the address of the beginning of the 
instruction (including prefixes if any) into DRi (@ = 0-3). RWi must equal 00 and LEN must 
equal 00 for instruction execution breakpoints. 


When the instruction beginning at the breakpoint address is about to be executed, the instruction 
execution breakpoint condition has occurred, and if the breakpoint is enabled, an exception | fault 
occurs before the instruction is executed. 


Note that an instruction execution breakpoint address must be equal to the beginning byte address 
of an instruction (including prefixes) for the instruction execution breakpoint to occur. 


GD (Global Debug Register access detect) 


The Debug Registers can be accessed only in Real Mode or at privilege level 0 in Protected Mode. 
The GD bit, when set, provides extra protection against any Debug Register access even in Real 
Mode or at privilege level 0 in Protected Mode. This additional protection feature is provided to 
guarantee that a software debugger can have full control over the Debug Register resources when 
required. The GD bit, when set, causes an exception | fault when an instruction attempts to read 
or write any Debug Register. The GD bit is automatically cleared when the exception | handler 
is invoked, allowing the exception | handler free access to the debug registers. 
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GE and LE (Exact data breakpoint match, global and local) 


The breakpoint mechanism of the Intel486 processor differs from that of the Intel386 processor. 
The Intel486 processor always does exact data breakpoint matching, regardless of GE/LE bit set- 
tings. Any data breakpoint trap is reported exactly after completion of the instruction that caused 
the operand transfer. Exact reporting is provided by forcing the Intel486 processor execution unit 
to wait for completion of data operand transfers before beginning execution of the next instruc- 
tion. 


When the Intel486 processor performs a task switch, the LE bit is cleared. Thus, the LE bit sup- 
ports fast task switching out of tasks that have enabled the exact data breakpoint match for their 
task-local breakpoints. The LE bit is cleared by the Intel486 processor during a task switch to 
avoid having exact data breakpoint match enabled in the new task. Note that exact data breakpoint 
match must be re-enabled under software control. 


The Intel486 processor GE bit is unaffected during a task switch. The GE bit supports exact data 
breakpoint match that remains enabled during all tasks executing in the system. 


Note that instruction execution breakpoints are always reported exactly. 
Gi and Li (breakpoint enable, global and local) 


When either Gi or Li is set, then the associated breakpoint (as defined by the linear address in 
DRi, the length in LENi and the usage criteria in RWi) is enabled. When either Gi or Li is set, 
and the Intel486 processor detects the na breakpoint condition, the exception | handler is in- 
voked. 


When the Intel486 processor performs a task switch to a new Task State Segment (TSS), all Li 
bits are cleared. Thus, the Li bits support fast task switching out of tasks that use some task-local 
breakpoint registers. The Li bits are cleared by the Intel486 processor during a task switch to 
avoid spurious exceptions in the new task. Note that the breakpoints must be re-enabled under 
software control. 


All Intel486 processor Gi bits are unaffected during a task switch. The Gi bits support breakpoints 
that are active in all tasks executing in the system. 
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11.3.3 Debug Status Register (DR6) 


A Debug Status Register (DR6 shown in Figure 11-1) allows the exception | handler to easily 
determine why it was invoked. Note that the exception | handler can be invoked as a result of one 
of several events: 


¢ DRO Breakpoint fault/trap 

¢ DRI Breakpoint fault/trap 

¢ XDR2 Breakpoint fault/trap 

¢ XDR3 Breakpoint fault/trap 

¢ XSingle-step (TF) trap 

¢ XTask switch trap 

¢ XFault due to attempted debug register access when GD=1 


The Debug Status Register contains single-bit flags for each of the possible events that invoke 
exception 1. Note below that some of these events are faults (exception taken before the instruc- 
tion is executed), whereas other events are traps (exception taken after the debug events oc- 
curred). 


The flags in DR6 are set by hardware but never cleared by hardware. Exception | handler soft- 
ware should clear DR6 before returning to the user program to avoid future confusion in identi- 
fying the source of exception 1. 


The fields within the Debug Status Register, DR6, are as follows: 
Bi (debug fault/trap due to breakpoint 0-3) 


Four breakpoint indicator flags, B[3:0], correspond one-to-one with the breakpoint registers in 
DR[3:0]. A flag Bi is set when the condition described by DRi, LENi, and RWi occurs. 


If Gi or Li is set, and if the i breakpoint is detected, the Intel486 processor invokes the 
exception | handler. The exception is handled as a fault when an instruction execution breakpoint 
occurs, or as a trap if a data breakpoint occurs. 


NOTE 
A flag Bi is set whenever the hardware detects a match condition on enabled 
breakpoint i. When a match is detected on at least one enabled breakpoint i, the 
hardware immediately sets all Bi bits that correspond to breakpoint conditions 
matching at that instant, whether enabled or not. Although the exception 1| 
handler may see that multiple Bi bits are set, only those set Bi bits that 
correspond to enabled breakpoints (Li or Gi set) are true indications of why the 
exception | handler was invoked. 
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BD (debug fault due to attempted register access when GD bit set) 


This bit is set when the exception | handler is invoked due to an instruction that attempts to read 
or write to the debug registers when the GD bit was set. If such an event occurs, then the GD bit 
is automatically cleared when the exception | handler is invoked, allowing the handler access to 
the debug registers. 


BS (debug trap due to single-step) 


This bit is set when the exception | handler is invoked due to the TF bit in the flag register being 
set (for single-stepping). 


BT (debug trap due to task switch) 


This bit is set when the exception | handler was invoked due to a task switch that occurs on a task 
having an Intel486 processor TSS with the T bit set. Note the task switch into the new task occurs 
normally, but before the first instruction of the task is executed, the exception | handler is in- 
voked. With respect to the task switch operation, the operation is considered to be a trap. 


11.3.4 Use of Resume Flag(RF) in Flag Register 


The Resume Flag (RF) in the flag word can suppress an instruction execution breakpoint when 
the exception | handler returns to a user program at a user address that is also an instruction ex- 
ecution breakpoint. 


intel. 


Instruction Set 


Chapter Contents 
12.1 Instruction Set .0.........cccccceecsssecessececeesececesseeeessneceneaeeeenseeeess 12-1 
12.2 Instruction Encoding «0.00... eee ceeeeceeeeesseeceseceeecencecsaeeeneeeeeas 12-2 


12.3. Clock Count Summary 


intel. 


CHAPTER 12 
INSTRUCTION SET SUMMARY 


This chapter describes the entire encoding structure and provides definitions of all fields occur- 
ring within the Intel486™ processor instructions. Detailed information on the CPUID instruction 
can be found in Appendix D, “Feature Determination.” 


12.1 INSTRUCTION SET 
The Intel486 processor instruction set can be divided into the following categories of operations: 
¢ Data Transfer 
¢ Arithmetic 
¢ Shift/Rotate 
¢ String Manipulation 
¢ Bit Manipulation 
¢ Control Transfer 
¢ High Level Language Support 
¢ Operating System Support 
¢ Processor Control 


All Intel486 processor instructions operate on either 0, 1, 2 or 3 operands; where an operand re- 
sides in a register, in the instruction itself, or in memory. Most zero-operand instructions (e.g., 
CLI, STI) take only one byte. One-operand instructions generally are two bytes long. The average 
instruction is 3.2-bytes long. Because the Intel486 processor has a 32-byte instruction queue, an 
average of 10 instructions are prefetched. The use of two operands permits the following types of 
common instructions: 


° Register to register 

¢ Memory to register 

¢ Memory to memory 

¢ Immediate to register 
¢ Register to memory 

¢ Immediate to memory 


The operands can be 8-, 16-, or 32-bits long. As a general rule, when executing 32-bit code, op- 
erands are 8 or 32 bits; when executing existing 80286 or 8086 processor code (16-bit code), op- 
erands are 8 or 16 bits. Prefixes can be added to all instructions to override the default length of 
the operands (i.e., to use 32-bit operands for 16-bit code, or 16-bit operands for 32-bit code). 
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12.1.1 Floating-Point Instructions 


In addition to the instructions listed above, the IntelDX2 and IntelDX4 processors have floating- 
point instructions and Floating-Point Control instructions. Note that all Floating-Point Unit in- 
struction mnemonics begin with an F. 


12.2 INSTRUCTION ENCODING 


12.2.1 Overview 


All instruction encodings are subsets of the general instruction format shown in Figure 12-1. In- 
structions consist of one or two primary opcode bytes, possibly an address specifier consisting of 
the “mod 1/m” byte and “scaled index” byte, a displacement if required, and an immediate data 
field if required. 


Within the primary opcode or opcodes, smaller encoding fields may be defined. These fields vary 
according to the class of operation. The fields define such information as direction of the opera- 
tion, size of the displacements, register encoding, or sign extension. 


Almost all instructions referring to an operand in memory have an addressing mode byte follow- 
ing the primary opcode byte(s). This byte, the mod r/m byte, specifies the address mode to be 
used. Certain encodings of the mod r/m byte indicate a second addressing byte, the scale-index- 
base byte, that follows the mod r/m byte to fully specify the addressing mode. 


Addressing modes can include a displacement immediately following the mod r/m byte or scaled 
index byte. When a displacement exists, the possible sizes are 8, 16, or 32 bits. 


When the instruction specifies an immediate operand, the it follows any displacement bytes. The 
immediate operand, when specified, is always the last field of the instruction. 


Figure 12-1 illustrates several of the fields that can appear in an instruction, such as the mod field 
and the r/m field, but the figure does not show all fields. Several smaller fields also appear in cer- 
tain instructions, sometimes within the opcode bytes themselves. Table 12-1 is a complete list of 
all fields appearing in the Intel486 processor instruction set. Following Table 12-1 are detailed 
tables for each field. 


12-2 


intel. INSTRUCTION SET SUMMARY 


TTTTTTTT | TTTTTTTT mod TT T r/m ssindex base | d32 | 16 |e | none data32 | 16 |e | none 
7 0 7 0 7654320 7654320 
| SS EE I 
T T T 
opcode “mod r/m” “s-i-b” address immediate 
(one or two bytes) byte byte displacement data 
(T represents an (4, 2, 1 bytes (4, 2, 1 bytes 
opcode bit.) | or none) or none) 


register and address 
mode specifier 


Figure 12-1. General Instruction Format 


Table 12-1. Fields within Intel486™ Processor Instructions 


fs Description peony a 
Ww Specifies whether data is byte or full size (full size is either 16 or 32 bits) 1 
d Specifies direction of data operation 1 
Ss Specifies whether an immediate data field must be sign-extended 1 
reg General register specifier 3 
mod r/m | Address mode specifier (effective address can be a general register) 2 for mod; 
3 for r/m 
ss Scale factor for scaled index address mode 2 


index |General register to be used as index register 
base | General register to be used as base register 


sreg2 | Segment register specifier for CS, SS, DS, ES 
sreg3 | Segment register specifier for CS, SS, DS, ES, FS, GS 
tttn For conditional instructions, specifies a condition asserted or a condition negated 


&]_ WO] Pp] CO] & 


NOTE: Tables 12-15 through 12-19 show encoding of individual instructions. 


12.2.2 32-Bit Extensions of the Instruction Set 


With the Intel486 processor, the 8086/80 186/80286 instruction set is extended in two orthogonal 
directions: 32-bit forms of all 16-bit instructions are added to support the 32-bit data types, and 
32-bit addressing modes are made available for all instructions referencing memory. This orthog- 
onal instruction set extension is accomplished having a Default (D) bit in the code segment de- 
scriptor, and by having two prefixes to the instruction set. 
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Whether the instruction defaults to operations of 16 bits or 32 bits depends on the setting of the 
D bit in the code segment descriptor, which gives the default length (either 32 bits or 16 bits) for 
both operands and effective addresses when executing that code segment. In Real Address Mode 
or Virtual 8086 Mode, no code segment descriptors are used, but the Intel486 processor assumes 
a D value of 0 when operating in those modes (for 16-bit default sizes compatible with the 
8086/80186/80286). 


Two prefixes, the Operand Size Prefix and the Effective Address Size Prefix, allow overriding 
individually the Default selection of operand size and effective address size. These prefixes may 
precede any opcode bytes and affect only the instruction they precede. If necessary, one or both 
of the prefixes may be placed before the opcode bytes. The Operand Size Prefix and the Effective 
Address Prefix toggle the operand size or the effective address size, respectively, to the value 
“opposite” the Default setting. For example, when the default operand size is for 32-bit data op- 
erations, the presence of the Operand Size Prefix toggles the instruction to 16-bit data operation. 
When the default effective address size is 16 bits, the presence of the Effective Address Size pre- 
fix toggles the instruction to use 32-bit effective address computations. 


These 32-bit extensions are available in all Intel486 processor modes, including Real Address 
Mode or Virtual 8086 Mode. In these modes the default is always 16 bits, so prefixes are needed 
to specify 32-bit operands or addresses. For instructions with more than one prefix, the order of 
prefixes is unimportant. 


Unless specified otherwise, instructions with 8-bit and 16-bit operands do not affect the contents 
of the high-order bits of the extended registers. 
12.2.3 Encoding of Integer Instruction Fields 


Within the instruction are several fields that indicate register selection, addressing mode and so 
on. The exact encodings of these fields are defined in this section. 


12.2.3.1 Encoding of Operand Length (w) Field 


For any given instruction that performs a data operation, the instruction executes as a 32-bit op- 
eration or a 16-bit operation. Within the constraints of the operation size, the w field encodes the 
operand size as either one byte or the full operation size, as shown in Table 12-2. 


Table 12-2. Encoding of Operand Length (w) Field 


w Field Operand Size during 16-Bit Data Operand Size during 32-Bit Data 
Operations Operations 
0 8 Bits 8 Bits 


1 16 Bits 32 Bits 
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12.2.3.2 | Encoding of the General Register (reg) Field 


INSTRUCTION SET SUMMARY 


The general register is specified by the reg field, which may appear in the primary opcode bytes, 
as the reg field of the “mod r/m” byte, or as the r/m field of the “mod r/m” byte. 


Table 12-3. Encoding of reg Field when the (w) Field is Not Present in Instruction 


reg Field Seni during 16-Bit Register Selected during 32-Bit 
ata Operations Data Operations 
000 AX EAX 
001 CX ECX 
010 DX EDX 
011 BX EBX 
100 SP ESP 
101 BP EBP 
110 Sl ESI 
111 DI EDI 


Table 12-4. Encoding of reg Field when the (w) Field is Present in Instruction 


Register Specified by reg Field during 16-Bit Data Operations: 
Function of w Field 
reg 
(when w = 0) (when w = 1) 

000 AL AX 
001 CL CX 
010 DL DX 
011 BL BX 
100 AH SP 
101 CH BP 
110 DH Sl 
111 BH DI 
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Table 12-4. Encoding of reg Field when the (w) Field is Present in Instruction 


Register Specified by reg Field during 32-Bit Data Operations 


Function of w Field 
reg 
(when w = 0) (when w = 1) 

000 AL EAX 
001 CL ECX 
010 DL EDX 
011 BL EBX 
100 AH ESP 
101 CH EBP 
110 DH ESI 
111 BH EDI 


12.2.3.3. Encoding of the Segment Register (sreg) Field 


The sreg field in certain instructions is a 2-bit field allowing one of the four 80286 segment reg- 
isters to be specified. The sreg field in other instructions is a 3-bit field, allowing the Intel486 pro- 
cessor FS and GS segment registers to be specified. 


Table 12-5. 2-Bit sreg2 Field 


2-bit sreg2 Field Segment Register Selected 
00 ES 
01 cS 
10 Ss 
11 DS 


Table 12-6. 3-Bit sreg3 Field 


3-bit sreg3 Field Segment Register Selected 
000 ES 
001 cs 
010 Ss 


101 GS 
110 do not use 
111 do not use 
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12.2.3.4 Encoding of Address Mode 


Except for special instructions, such as PUSH or POP, where the addressing mode is pre-deter- 
mined, the addressing mode for the current instruction is specified by addressing bytes following 
the primary opcode. The primary addressing byte is the “mod r/m” byte, and a second byte of ad- 
dressing information, the “s-i-b” (scale-index-base) byte, can be specified. 


The s-i-b (scale-index-base byte) byte is specified when using 32-bit addressing mode and the 
“mod 1/m” byte has r/m = 100 and mod = 00, 01 or 10. When the sib byte is present, the 32-bit 
addressing mode is a function of the mod, ss, index, and base fields. 


The primary addressing byte, the “mod 1/m” byte, also contains three bits (shown as TTT in Fig- 
ure 12-1) sometimes used as an extension of the primary opcode. The three bits, however, may 
also be used as a register field (reg). 


When calculating an effective address, either 16-bit addressing or 32-bit addressing is used. 
16-bit addressing uses 16-bit address components to calculate the effective address, and 32-bit 
addressing uses 32-bit address components to calculate the effective address. When 16-bit ad- 
dressing is used, the “mod r/m” byte is interpreted as a 16-bit addressing mode specifier. When 
32-bit addressing is used, the “mod r/m” byte is interpreted as a 32-bit addressing mode specifier. 


Tables 12-7, 12-8, and 12-9 define encodings of all 16-bit and 32-bit addressing modes. 
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Table 12-7. Encoding of 16-Bit Address Mode with “mod r/m” Byte 


mod r/m Effective Address mod r/m Effective Address 
00 000 DS:[BX+SI] 10 000 DS:[BX+Sl+d16] 
00 001 DS:[BX+DI] 10 001 DS:[BX+Dl+d16] 
00 010 SS:[BP+SI] 10 010 SS:[BP+Sl+d16] 
00 011 SS:[BP+DI]] 10011 SS:[BP+Dl+d16] 
00 100 DS:[SI] 10 100 DS:[Sl+d16] 
00 101 DS [DI] 10 101 DS:[Dl+d16] 
00 110 DS:d16 10110 SS:[BP+d16] 
00 111 DS:[BX] 10111 DS:[BX+d16] 
01 000 DS:[BX+SI+d8] 11 000 register-see below 
01 001 DS:[BX+Dl+d8] 11001 register-see below 
01 010 SS :[BP+Sl+d8] 11010 register-see below 
01011 SS:[BP+Dl+d8] 11011 register—see below 
01 100 DS:[Sl+d8] 11 100 register-see below 
01 101 DS:[D1l+d8] 11101 register—see below 
01 110 SS:[BP+d8] 11110 register-see below 
01111 DS:[BX+d8] 11111 register—see below 
Register Specified by r/m during Register Specified by r/m during 
16-Bit Data Operations 32-Bit Data Operations 
Function of w Field Function of w Field 
mod r/m mod r/m 
(when w=0) | (when w =1) (when w=0) | (when w =1) 
11 000 AL AX 11 000 AL EAX 
a 
11010 DL DX 11010 DL EDX 
[om |e |x| [mon |e | ex | 
11 100 AH SP 11 100 AH ESP 
11101 CH BP 11101 CH EBP 
11110 DH Sl 11110 DH ESI 
11111 BH DI 11111 BH EDI 
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(No “s-i-b” Byte Present) 


Table 12-8. Encoding of 32-Bit Address Mode with “mod r/m” Byte 


INSTRUCTION SET SUMMARY 


mod r/m Effective Address mod r/m Effective Address 
00 000 DS:[EAX] 10 000 DS:[EAX+d32] 
00 001 DS:[ECX] 10 001 DS:[ECX+d32] 
00 010 DS:[EDX] 10 010 DS:[EDX+d32] 
00 011 DS:[EBX] 10011 DS:[EBX+d32] 
00 100 s-i-b is present 10 100 s-i-b is present 
00 101 DS:d32 10 101 SS:[EBP+d32] 
00 110 DS:[ESI] 10110 DS [ESI+d32] 
00 111 DS:[EDI] 10111 DS:[EDI+d32] 
01 000 DS:[EAX+d8] 11 000 register-see below 
01 001 DS:[ECX+d8] 11 001 register-see below 
01010 DS:[EDX+d8] 11010 register-see below 
01 011 DS:[EBX+d8] 11011 register—see below 
01 100 s-i-b is present 11 100 register-see below 
01 101 SS:[EBP+d8] 11101 register-see below 
01110 DS:[ESI+d8] 11110 register-see below 
01111 DS:[EDI+d8] 11111 register-see below 


Register Specified by reg or r/m 
during 16-Bit Data Operations: 


Function of w field 


Register Specified by reg or r/m 
during 32-Bit Data Operations: 


Function of w field 


mod r/m mod r/m 
(when w=0) (when w=1) (when w=0) (when w=1) 
11 000 AL AX 11 000 AL EAX 
11001 CL CX 11001 CL ECX 
11010 DL DX 11010 DL EDX 
11011 BL BX 11011 BL EBX 
11 100 AH SP 11 100 AH ESP 
11101 CH BP 11101 CH EBP 
11110 DH Sl 11.110 DH ESI 
11111 BH DI 11111 BH EDI 
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Table 12-9. Encoding of 32-Bit Address Mode 
(“mod r/m” Byte and “s-i-b” Byte Present) 


mod base Effective Address ss Scale Factor 

00 000 DS:[EAX+(scaled index)] 00 x1 

00 001 DS:[ECX+(scaled index)] 01 x2 

00 010 DS:[EDX+(scaled index)] 10 x4 

00 011 DS:[EBX+(scaled index)] 11 x8 

00 100 SS:[ESP+(scaled index)] Index Index Register 

00 101 DS:[d32+(scaled index)] 000 EAX 

00 110 DS:[ESI+(scaled index)] 001 ECX 

00111 DS:[EDI+(scaled index)] 010 EDX 

01 000 DS:[EAX+(scaled index)+d8] 011 EBX 

01 001 DS:[ECX+(scaled index) +d8] 100 no index regt 

01010 DS:[EDX+(scaled index)+d8] 101 EBP 

01011 DS:[EBX+(scaled index)+d8] 110 ESI 

01 100 SS:[ESP+(scaled index)+d8] 111 EDI 

01 101 SS:[EBP+(scaled index)+d8] NOTE: When index field is 100, indicating “no 
index register,” then ss field MUST equal 
00. When index is 100 and ss does not 
equal 00, the effective address is 
undefined. 

01 110 DS:[ESI+(scaled index)+d8] 

01111 DS:[EDI+(scaled index)+d8] 

10 000 DS:[EAX+(scaled index)+d32] 

10 001 DS:[ECX+(scaled index)+d32] 

10010 DS:[EDX+(scaled index)+d32] 

10011 DS:[EBX+(scaled index)+d32] 

10 100 SS:[ESP+(scaled index)+d32] 

10 101 SS:[EBP+(scaled index)+d32] 

10 110 DS:[ESI+(scaled index)+d32] 


10111 


DS:[EDI+(scaled index)+d32] 


NOTE: Mod field in “mod r/m” byte; ss, index, 
base fields in “s-i-b” byte. 
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12.2.3.5 Encoding of Operation Direction (d) Field 


In many two-operand instructions the d field is present to indicate which operand is considered 
the source and which is the destination. 


Table 12-10. Encoding of Operation Direction (d) Field 


d Direction of Operation 


Register/Memory < Register “reg” Field 
Indicates Source Operand; “mod r/m” or 
“mod ss index base” Indicates Destination 
Operand 


1 Register < Register/Memory “reg” Field 
Indicates Destination Operand; “mod r/m” 
or “mod ss index base” Indicates Source 
Operand 


12.2.3.6 | Encoding of Sign-Extend (s) Field 


The s field occurs primarily to instructions with immediate data fields. The s field has an effect 
only when the size of the immediate data is 8 bits and is being placed in a 16-bit or 32-bit desti- 
nation. 


Table 12-11. Encoding of Sign-Extend (s) Field 


Effect on Immediate Data 8 Effect on Immediate Data 16 | 32 
0 None None 
1 Sign-Extend Data 8 to Fill 16-bit or None 


32-bit Destination 
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12.2.3.7 Encoding of Conditional Test (tttn) Field 


For the conditional instructions (conditional jumps and set on condition), tttn is encoded with n, 
indicating to use the condition (n=0) or its negation (n=1), and ttt, indicating the condition to test. 


Table 12-12. Encoding of Conditional Test (tttn) Field 


Mnemonic Condition tttn 
oO Overflow 0000 
NO No Overflow 0001 
B/NAE Below/Not Above or Equal 0010 
NB/AE Not Below/Above or Equal 0011 
E/Z Equal/Zero 0100 
NE/NZ Not Equal/Not Zero 0101 
BE/NA Below or Equal/Not Above 0110 
NBE/A Not Below or Equal/Above 0111 
S) Sign 1000 
NS Not Sign 1001 
P/PE Parity/Parity Even 1010 
NP/PO Not Parity/Parity Odd 1011 
L/NGE Less Than/Not Greater or Equal 1100 
NL/GE Not Less Than/Greater or Equal 1101 
LE/NG Less Than or Equal/Greater Than 1110 
NLE/G Not Less or Equal/Greater Than 1111 
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12.2.3.8 


This field is used for loading and storing the Control, Debug and Test registers. 


Table 12-13. Encoding of Control or Debug or Test Register (eee) Field 


INSTRUCTION SET SUMMARY 


Encoding of Control or Debug or Test Register (eee) Field 


eee Code 


TTReg Name 


When Interpreted as Control Register Field: 


000 CRO 
010 CR2 
011 CR3 


When Interpreted as Debug Register Field: 


000 DRO 
001 DR1 
010 DR2 
011 DR3 
110 DR6 
111 DR7 


When Interpreted as 


Test Register Field: 


011 TR3 
100 TR4 
101 TRS 
110 TR6 
111 TR7 


NOTE: Do not use any other encoding 
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12.2.4 Encoding of Floating-Point Instruction Fields 


Instructions for the FPU assume one of the five forms shown in Table 12-14. In all cases, instruc- 
tions are at least two bytes long and begin with the bit pattern 11011B. 


Table 12-14. Encoding of Floating-Point Instruction Fields 


Instruction Optional 
First Byte Second Byte Picids 
1 11011 OPA 1 mod 1 | OPB r/m s-i-b disp 
2 11011 MF OPA mod OPB r/m s-i-b disp 
3 11011 d P OPA 1 1 OPB ST(i) 
4 11011 0 OP 
5 11011 OP 
15-11 10 9 8 7 6 5 43 21 0 
OP = Instruction opcode, possible split into two fields OPA and OPB 
MF = Memory Format 


00—32-bit real 
01—32-bit integer 
10—64-bit real 
11—16-bit integer 
P = Pop 
0-Do not pop stack 
1—-Pop stack after operation 


d = Destination 
0—Destination is ST(O) 
1—Destination is ST(i) 


R XORd = 0-Destination (op) Source 


rs) 
~< 
(o) 
rs) 
Q 
iI 


1-Source (op) Destination 


n 
4 

canny 

~~ 
ll 


Register stack element i 
000 = Stack top 

001 = Second stack element 
111 = Eighth stack element 


The mod (Mode field) and 1/m (Register/Memory specifier) have the same interpretation as the 
corresponding fields of the integer instructions. 
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The s-i-b (Scale Index Base) byte and disp (displacement) are optionally present in instructions 
that have mod and r/m fields. Their presence depends on the values of mod and r/m, as for integer 
instructions. 


12.3 CLOCK COUNT SUMMARY 


To calculate elapsed time for an instruction, multiply the instruction clock count, as listed in Ta- 
bles 12-15 through 12-19, by the processor core clock period (e.g., 10 ns for a 100-MHz IntelDX4 
processor). 


12.3.1 Instruction Clock Count Assumptions 


The Intel486 processor instruction core clock count tables give clock counts assuming data and 
instruction accesses hit in the cache. The combined instruction and data cache hit rate is greater 
than 90%. 


A cache miss forces the Intel486 processor to run an external bus cycle. The 32-bit burst bus is 
defined as r-b-w, where: 


r=  Thenumber of bus clocks in the first cycle of a burst read or the number of clocks per data 
cycle in a non-burst read. 


b= The number of bus clocks for the second and subsequent cycles in a burst read. 
w= The number of bus clocks for a write. 


The clock counts in the cache miss penalty column assume a 2-1-2 bus. For slower buses add r-2 
clocks to the cache miss penalty for the first dword accessed. Other factors also affect instruction 
clock counts. 


Instruction Clock Count Assumptions 


1. The external bus is available for reads or writes at all times; otherwise, add bus clocks to 
reads until the bus is available. 


2. Accesses are aligned. Add three core clocks to each misaligned access. 


3. Cache fills complete before subsequent accesses to the same line. When a read misses the 
cache during a cache fill due to a previous read or pre-fetch, the read must wait for the 
cache fill to complete. When a read or write accesses a cache line still being filled, it must 
wait for the fill to complete. 


4. When an effective address is calculated, the base register is not the destination register of 
the preceding instruction. When the base register is the destination register of the 
preceding instruction, add 1 to the core clock counts shown. Back-to-back PUSH and POP 
instructions are not affected by this rule. 


5. An effective address calculation uses one base register and does not use an index register. 
However, when the effective address calculation uses an index register, one core clock 
may be added to the clock count shown. 
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10. 


11. 


12. 


12-16 


The target of a jump is in the cache. If not, add r clocks for accessing the destination 
instruction of a jump. When the destination instruction is not completely contained in the 
first dword read, add a maximum of 3b bus clocks. When the destination instruction is not 
completely contained in the first 16 byte burst, add a maximum of r+3b bus clocks. 


If no write buffer delay occurs, w bus clocks are added only when all write buffers are full. 


Displacement and immediate must not be used together. If displacement and immediate 
are used together, one core clock may be added to the core clock count shown. 


No invalidate cycles. Add a delay of one bus clock for each invalidate cycle if the 
invalidate cycle contends for the internal cache/external bus when the Intel486 processor 
needs to use it. 


Page translation hits in TLB. A TLB miss adds 13, 21 or 28 bus clocks + | possible core 
clock to the instruction depending on whether the Accessed and/or Dirty bit in neither, 
one, or both of the page entries must be set in memory. This assumes that neither page 
entry is in the data cache and a page fault does not occur on the address translation. 


No exceptions are detected during instruction execution. Refer to Table 12-17 for extra 
clocks when an interrupt is detected. 


Instructions that read multiple consecutive data items (i.e., task switch, POPA, etc.) and 
miss the cache are assumed to start the first access on a 16-byte boundary. If not, an extra 
cache line fill may be necessary, which may add up to (r+3b) bus clocks to the cache miss 
penalty. 


intel. 


Table 12-15. Clock Count Summary (Sheet 1 of 17) 


INSTRUCTION SET SUMMARY 


Instruction Format eee itcache Notes 
Miss 
INTEGER OPERATIONS 
MOV = Move: 
regi to reg2 1000 100w : 11 regi reg2 1 
reg2 to reg 1000 101w : 11 regi reg2 1 
memory to reg 1000 100w : mod reg r/m 1 2 
Immediate to reg 1100 011w : 11000 reg : immediate data 1 
or 1011W reg : immediate data 1 
Immediate to Memory 1100 01w : mod 000 r/m : displacement 1 
immediate 
Memory to Accumulator 1010 000w : full displacement 1 2 
Accumulator to Memory 1010 001w : full displacement 1 
MOVSX/MOVZX = Move with Sign/Zero Extension 
reg2 to reg1 0000 1111: 1011 z11w: 11 reg1 reg2 3 
memory to reg 0000 1111 : 1011 z11w: mod reg r/m 3 2 
Z_ instruction 
0 MOVZX 
1 MOVSX 
PUSH = Push 
reg 1111 1111:11 110 reg 4 
or 01010 reg 1 
memory 11111111 : mod 110 r/m 4 1 1 
immediate 0110 10s0 : immediate data 1 
PUSHA = Push All 0110 0000 11 
POP = Pop 
reg 1000 1111: 11 000 reg 4 1 
or 01011 reg 1 2 
memory 1000 1111 : mod 000 r/m 5 2 1 
POPA = Pop All 0110 0001 9 7/15 16/32 
XCHG = Exchange 
reg1 with reg2 1000 011w : 11 regi reg2 3 
Accumulator with reg 10010 reg 3 
Memory with reg 1000 011w : mod reg r/m 5 
NOP = No Operation 1001 0000 1 
NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
12-17 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


Table 12-15. Clock Count Summary (Sheet 2 of 17) 


Penalty 
Instruction Format care if Cache Notes 
Miss 
LEA = Load EA to Register 1000 1101 : mod reg r/m 
no index register 1 
with index register 2 
Instruction TIT 
ADD = Add 000 
ADC = Add with Carry 010 
AND = Logical AND 100 
OR = Logical OR 001 
SUB = Subtract 101 
SBB = Subtract with Borrow 011 
XOR = Logical Exclusive OR 110 
reg1 to reg2 OOTT TOOw : 11 regi reg2 1 
reg2 to reg1 OOTT TO1w : 11 regi reg2 1 
memory to register OOTT TO1w: mod reg r/m 2 2 
register to memory OOTT TOOw : mod reg r/m 3 6/2 U/L 
immediate to register 1000 00sw : 11 TTT reg : immediate 1 
register 
immediate to Accumulator OOTT T10w : immediate data 1 
immediate to memory 1000 00sw : mod TTT r/m : immediate data 3 6/2 U/L 
Instruction TTT 
INC = Increment 000 
DEC = Decrement 001 
reg 1111 111w:11 TTT reg 1 
or O1TTT reg 1 
memory 1111 111w: mod TTT r/m 3 6/2 U/L 
Instruction TTT 
NOT = Logical Complement 010 
NEG = Negate 011 
reg 1111 011w:11 TTT reg 1 
memory 1111 011w : mod TTT r/m 3 6/2 U/L 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Table 12-15. Clock Count Summary (Sheet 3 of 17) 


Instruction Format eae itcache Notes 
Miss 
CMP = Compare 
reg1 with reg2 0011 100w : 11 regi reg2 1 
reg2 with reg1 0011 101w: 11 regi reg2 1 
memory with register 0011 100w : mod reg r/m 2 
register with memory 0011 101w : mod reg r/m 2 
immediate with register 1000 00sw : 11 111 reg : immediate data 1 
immediate with acc. 0011 110w : immediate data 1 
immediate with memory 1000 00sw : mod 111 r/m : immediate data 2 2 
TEST = Logical Compare 
regi and reg2 1000 010w : 11 regi reg2 1 
memory and register 1000 010w : mod reg r/m 2 2 
immediate and register 1111 011w: 11 000 reg : immediate data 1 
immediate and acc. 1010100w : immediate data 1 
immediate and memory ~—_1111 011w: mod 000 r/m : immediate data 2 2 
MUL = Multiply (unsigned) 
acc. with register 1111 011w:11 100 reg 
Multiplier-Byte 13/18 MN/MX,3 
Word 13/26 MN/MX,3 
Dword 13/42 MN/MX,3 
acc. with memory 1111 011w : mod 100 r/m 
Multiplier-Byte 13/18 1 MN/MX,3 
Word 13/26 1 MN/MX,3 
Dword 13/42 1 MN/MX,3 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Table 12-15. Clock Count Summary (Sheet 4 of 17) 


Penalty 
Instruction Format See if Cache Notes 
Miss 
IMUL = Integer Multiply (unsigned) 
acc. with register 1111 011w: 11 101 reg 
Multiplier-Byte 13/18 MN/MX,3 
Word 13/26 MN/MX,3 
Dword 13/42 MN/MX,3 
acc. with memory 1111 .011w :mod 101 r/m 
Multiplier-Byte 13/18 MN/MX,3 
Word 13/26 MN/MX,3 
Dword 13/42 MN/MX,3 
reg1 with reg2 0000 1111 : 10101111: 11 reg1 reg2 
Multiplier-Byte 13/18 MN/MX,3 
Word 13/26 MN/MX,3 
Dword 13/42 MN/MX,3 
register with memory 0000 1111 : 10101111 : mod reg r/m 
Multiplier-Byte 13/18 1 MN/MX,3 
Word 13/26 1 MN/MX,3 
Dword 13/42 1 MN/MX,3 
reg1 with imm. to reg2 0110 10s1 : 11 regi reg2 : immediate data 
Multiplier-Byte 13/18 MN/MX,3 
Word 13/26 MN/MX,3 
Dword 13/42 MN/MX,3 
mem. with imm. to reg. 0110 10s1 : mod reg r/m : immediate data 
Multiplier-Byte 13/18 MN/MX,3 
Word 13/26 MN/MX,3 
Dword 13/42 MN/MX,3 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Table 12-15. Clock Count Summary (Sheet 5 of 17) 


Penalty 
Instruction Format eae if Cache Notes 
Miss 
For the IntelDX4™ Processor Only: 
IMUL = Integer Multiply (signed) 
acc. with register 1111 011w:11 101 reg 
Multiplier-Byte 5/5 MN/MX,3 
Word 5/6 MN/MX,3 
Dword 6/12 MN/MX,3 
acc. with memory 1111 011w: mod 1 01 r/m 
Multiplier-Byte 5/5 MN/MX,3 
Word 5/6 MN/MX,3 
Dword 6/12 MN/MX,3 
reg1 with reg2 0000 1111: 1010 1111: 11 regi reg2 
Multiplier-Byte 5/5 MN/MX,3 
Word 5/6 MN/MX,3 
Dword 6/12 MN/MX,3 
register with memory 0000 1111 : 1010 1111 : mod reg r/m 
Multiplier-Byte 5/5 MN/MX,3 
Word 5/6 MN/MX,3 
Dword 6/12 MN/MX,3 
reg1 with imm. to reg2 0110 10s1 : 11 regi reg2 : immediate data 
Multiplier-Byte 5/5 MN/MX,3 
Word 5/6 MN/MX,3 
Dword 6/12 MN/MX,3 
mem. with imm. to reg. 0110 10s1 : mod reg r/m : immediate data 
Multiplier-Byte 5/5 MN/MX,3 
Word 5/6 MN/MX,3 
Dword 6/12 MN/MX,3 
DIV = Divide (unsigned) 
acc. by register 1111 011w: 1111 0 reg 
Divisor-Byte 16 
Word 24 
Dword 40 
acc. by memory 1111 011w:mod110r/m 
Divisor-Byte 16 
Word 24 
Dword 40 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Table 12-15. Clock Count Summary (Sheet 6 of 17) 


Penalty 
Instruction Format ae if Cache Notes 
Miss 
IDIV = Integer Divide 
(signed) 
acc. by register 1111 011w:1111 1 reg 
Divisor-Byte 19 
Word 27 
Dword 43 
acc. by memory 1111 011w: mod 11 1 r/m 
Divisor-Byte 20 
Word 28 
Dword 44 
CBW = Convert Byte to Word 1001 1000 3 
CWD = Convert Word to Dword 
1001 1001 3 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Table 12-15. Clock Count Summary (Sheet 7 of 17) 


INSTRUCTION SET SUMMARY 


Penalty 
Instruction Format See if Cache Notes 
Miss 
Instruction TTT 
ROL = Rotate Left 000 
ROR = Rotate Right 001 
RCL = Rotate Through Carry 010 
Left 
RDR = Rotate Through 011 
Carry Right 
SHL/SAL = Shift Logical/ 100 
Arithmetic Left 
SHR = Shift Logical Right 101 
SAR = Shift Arithmetic Right 111 
Not Through Carry (ROL, ROR, SAR, SHL, and SHR) 
reg by 1 1101 000w : 11 TTT reg 3 
memory by 1 1101 000w : mod TTT r/m 4 6 
reg by CL 1101 001w: 11 TTT reg 3 
memory by CL 1101 001w : mod TTT r/m 4 6 
reg by immediate count 1100 000w : 11 TTT reg : imm. 8-bit data 2 
mem by immediate count 1100 000w : mod TTT r/m: imm. 8-bit data 4 6 
Through Carry (RCL and RCR) 
reg by 1 1101 000w : 11 TTT reg 3 
memory by 1 1101 000w : mod TTT r/m 4 6 
reg by CL 1101 001w: 11 TTT reg 8/30 MN/MX,4 
memory by CL 1101 001w : mod TTT r/m 9/31 MN/MX,5 
reg by immediate count 1100 000w : 11 TTT reg : imm. 8-bit data 8/30 MN/MX,4 
mem by immediate count 1100 000w : mod TTT r/m:imm. 8-bit data] 9/31 MN/MX,5 
Instruction TIT 
SHLD = Shift Left Double 100 
SHRD = Shift Right Double 101 
register with immediate 0000 1111: 10TT T100: 11 reg2 reg1 2 
:imm. 8-bit data 
memory with immediate 0000 1111 : 10TT T100 : mod reg r/m 3 6 
:imm. 8-bit data 
register by CL 0000 1111 : 10TT T101 : 11 reg2 reg1 
memory by CL 0000 1111: 10TT T101 : mod reg r/m 5 
NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Instruction Format ae itcache Notes 
Miss 
BSWAP = Byte Swap 0000 1111 : 11001 reg 1 
XADD = Exchange and Add 
regi, reg2 0000 1111 : 1100 000w : 11 reg2 reg 
memory, reg 0000 1111 : 1100 000w : mod reg r/m 6/2 U/L 


CMPXCHG = Compare and Exchange 


reg1, reg2 0000 1111 : 1011 000w : 11 reg2 reg! 6 
memory, reg 0000 1111 : 1011 000w : mod reg r/m 7/10 2 6 
CONTROL TRANSFER (within segment) 
Note: Times are jump taken/not taken 
Jcccc = Jump on cccc 
8-bit displacementt 0111 tttn : 8-bit disp. 3/1 T/NT,23 
full displacement 0000 1111 : 1000 tttn : full displacement 3/1 T/NT,23 
Note: Times are jump taken/not taken 
SETcccc = Set Byte on cccc (Times are cccc true/false) 
reg 0000 1111 : 1001 tttn : 11 000 reg 4/3 
memory 0000 1111 : 1001 tttn : mod 0000 r/m 3/4 
Mnemonic cccc Condition tttn 
O Overflow 0000 
NO No Overflow 0001 
B/NAE Below/Not Above or Equal 0010 
NB/AE Not Below/Above or Equal 0011 
E/Z Equal Zero 0100 
NE/NZ Not Equal/Not Zero 0101 
BE/NA Below or Equal/Not Above 0110 
NBE/A Not Below or Equal/Above 0111 
S) Sign 1000 
NS Not Sign 1001 
P/PE Parity/Parity Even 1010 
NP/PO Not Parity/Parity Odd 1011 
L/INGE Less Than/Not Greater or Equal 1100 
NL/GE Not Less Than/Greater or Equal 1101 
LE/NG Less Than or Equal/Greater Than 1110 
NLE/G Not Less Than or 
Equal/Greater Than 1111 
LOOP = LOOP CX Times 1110 0010 : 8-bit disp. 7/6 L/NL,23 
LOOPZ/LOOPE = Loop with Zero/Equal 
1110 0001 : 8-bit disp. 9/6 L/NL,23 
LOOPNZ/LOOPNE = Loop While Not Zero 
1110 0000 : 8-bit disp. 9/6 L/NL,23 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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INSTRUCTION SET SUMMARY 


Instruction Format vee itcache Notes 
Miss 
JCXZ = Jump on CX Zero —-: 11110 0011 : 8-bit disp. 8/5 T/NT,23 
JECXZ = Jump on ECX 1110 0011 : 8-bit disp. 8/5 T/NT,23 
Zero 
(Address Size Prefix Differentiates JCXZ for JECXZ) 
JMP = Unconditional Jump (within segment) 
Short 1110 1011 : 8-bit disp. 3 7,23 
Direct 1110 1001 : full displacement 3 7,23 
Register Indirect 1111 1111:11 100 reg 5 7,23 
Memory Indirect 1111 1111 : mod 100 r/m 5 5 7 
CALL = Call (within segment) 
Direct 1110 1000 : full displacement 3 7,23 
Register Indirect 11111111 :11010 reg 7,23 
Memory Indirect 1111 1111 : mod 010 reg 5 7 
RET = Return from CALL (within segment) 
1100 0011 
Adding Immediate to SP 1100 0010 : 16-bit disp. 
ENTER = Enter Procedure 1100 1000: 16-bit disp., 8-bit level 
Level = 0 14 
Level = 1 17 
Level (L) > 1 174+3L 8 
LEAVE = Leave Procedure 1100 1001 5 1 
MULTIPLE-SEGMENT INSTRUCTIONS 
MOV = Move 
reg. to segment reg. 1000 1110: 11 sreg3 reg 3/9 0/3 RV/P,9 
memory to segment reg. 1000 1110 : mod sreg3 r/m 3/9 2/5 RV/P,9 
segment reg. to reg. 1000 1100 : 11 sreg3 reg 3 
segment reg. to memory 1000 1100: mod sreg3 r/m 3 
PUSH = Push 
segment reg. 000sreg 2110 3 
(ES, CS, SS, or DS) 
segment reg. (FS or GS) 0000 1111 : 10 sreg3001 3 
NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Penalty 
Instruction Format ae if Cache Notes 
Miss 
POP = Pop 
segment reg. 000sreg 2111 3/0 2/5 RV/P,9 
(ES, CS, SS, or DS) 
segment reg. (FS or GS) 0000 1111 : 10 sreg3001 3/9 2/5 RV/P,9 
LDS = Load Pointer toDS 11000101 : mod reg r/m 6/12 7/10 RV/P,9 
LES = Load Pointer toES 11000100: mod reg r/m 6/12 7/10 RV/P,9 
LFS = Load Pointer to FS 0000 1111 : 1011 0100 : mod reg r/m 
LGS = Load Pointer toGS 0000 1111: 10110101 : mod reg r/m 
LSS = Load PointertoSS 00001111: 10110010: mod reg r/m 6/12 7/10 RV/P,9 
CALL = Call 
Direct intersegment 1001 1010 : unsigned full offset, selector 18 2 R,7,22 
to same level 20 3 P,9 
thru Gate to same level 35 6 P,9 
to inner level, no parameters 69 17 P,9 
to inner level, x parameters (d) words 77+4X 17+n P,11,9 
to TSS 37+TS 3 P,10,9 
thru Task Gate 38+TS 3 P,10,9, 
Indirect intersegment 11111111 :mod 011 r/m 17 8 R,7 
to same level 20 10 P,9 
thru Gate to same level 35 13 P,9 
to inner level, no parameters 69 24 P,Q 
to inner level, x parameters (d) words 77+4X 24+4n P,11,9 
to TSS 37+TS 10 P,10,9 
thru Task Gate 38+TS 10 P,10,9, 
RET = Return from CALL 
intersegment 1100 1010 13 R,7 
to same level 17 9 P,9 
to outet lever 35 12 P,9 
intersegment adding 1100 1010 : 16-bit disp. 14 R,7 
imm. to SP 
to same level 18 9 P,9 
to outer level 36 12 P,9 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 


12-26 


intel. 


Table 12-15. Clock Count Summary (Sheet 11 of 17) 


INSTRUCTION SET SUMMARY 


Penalty 
Instruction Format eae if Cache Notes 
Miss 
JMP = Unconditional Jump 
Direct intersegment 1110 1010 : unsigned full offset, selector 17 2 R,7,22 
to same level 19 3 P,9 
thru Call Gate to same level 32 6 P,9 
thru TSS 42+TS 3 P,10,9 
thru Task Gate 43+TS 3 P,10,9, 
Indirect intersegment 11111111 :mod 011 r/m 13 9 R,7,9 
to same level 18 10 P,9 
thru Call Gate to same level 31 13 P,9 
thru TSS 41+TS 10 P,10,9 
thru Task Gate 42+TS 10 P,10,9, 
BIT MANIPULATION 
BT = Test Bit 
register, immediate 0000 1111: 1011 1010: 11 100 reg: 3 
imm. 8-bit data 
memory, immediate 0000 1111 : 1011 1010 : mod 100 r/m : 3 1 
imm. 8-bit data 
regi, reg2 0000 1111 : 1010 0011 : 11 reg2 reg1 
memory, reg 0000 1111 : 1010 0011 : mod reg r/m 2 
Instruction TTT 
BTS = Test Bit and Set 101 
BTR = Test Bit and Reset 110 
BTC = Test Bit and 
Compliment 111 
register, immediate 0000 1111: 1011 1010:11 TTT reg 6 
imm. 8-bit data 
memory, immediate 0000 1111: 1011 1010:mod TTT r/m 8 U/L 
imm. 8-bit data 
regi, reg2 0000 1111: 10TT T011:11 reg2 regi 
memory, reg 0000 1111: 10TT T011:mod reg r/m 13 U/L 
BSF = Scan Bit Forward 
reg1, reg2 0000 1111: 1011 1100: 11 reg2 reg1 6/42 MN/MX, 
12 
memory, reg 0000 1111 : 1011 1100 : mod reg r/m 7/43 2 MN/MX, 15 
BSR = Scan Bit Reverse 
reg1, reg2 0000 1111 : 1011 1101 : 11 reg2 reg1 6/103 MN/MX, 14 
memory, reg 0000 1111 : 1011 1101 : mod reg r/m 7/104 1 MN/MX, 15 
NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Instruction Format ee itcache Notes 
Miss 
STRING INSTRUCTIONS 
CMPS = Compare Byte Word 1010 011w 8 6 16 
LODS = Load Byte/Word to AL/AX/EAX 
1010 111w 5 2 


MOVS = Move Byte/Word 1010 010w 
SCAS = Scan Byte/Word 1010 111w 
STOS = Store Byte/Word from AL/AX/EX 
1010 101w 
XLAT = Translate String 1101 0111 


REPEATED STRING INSTRUCTIONS 
Repeated by Count in CX or ECX (C=Count in CX or ECX) 


REPE CMPS = Compare 1111 0011 : 1010 011w 
String 

(Find Non-match) 

C=0 

C>0 


REPNE CMPS =Compare’ 11110010: 1010 011w 
String 
(Find Match) 


C=0 


C>0 


REP LODS = Load String 1111 0010 : 1010 110w 


C=0 
C>0 


REP MOVS = Move String 1111 0010 : 1010 010w 


C=0 
C=1 
C>1 
REPE SCAS = Scan String 1111 0011:1010 111w 
(Find Non-AL/AX/EAX) 
C=0 
C>0 
REPNE SCAS = Scan String 11110010: 1010 111w 
(Find AL/AX/EAX) 
C=0 
C>0 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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INSTRUCTION SET SUMMARY 


Instruction Format eae iicache Notes 
Miss 
REP STOS = Store String 11110010: 1010 101w 
C=0 5 
C>0 7+4¢ 
FLAG CONTROL 
CLC = Clear Carry Flag 1111 1000 
STC = Set Carry Flag 1111 1001 
CMC = Complement Carry 11110101 
Flag 
CLD = Clear Direction Flag 1111 1100 
STD = Set Direction Flag 1111 1101 
CLI = Clear Interrupt Enable 1111 1010 
Flag 
STI = Set Interrupt Enable 11111011 5 
Flag 
LAHF = Load AH into Flag 1001 1111 3 
FLAG CONTROL 
SAHF = Store AH into Flag 1001 1110 2 
PUSHF = Push Flags 1001 1100 4/3 RV/P 
POFF = Pop Flags 1001 1101 9/6 RV/P 
DECIMAL ARITHMETIC 
AAA = ASCII Adjust to Add 0011 0111 
AAS = ASCII Adjust for 0011 1111 
Subtract 
AAM = ASCII Adjust for 1101 0100 : 0000 1010 15 
Multiply 
AAD = ASCII Adjust for 1101 0101 : 0000 1010 14 
Divide 
DAA = Decimal Adjust for 0010 0111 2 
Add 
DAS = Decimal Adjust for Subtract 
0010 1111 2 
PROCESSOR CONTROL INSTRUCTIONS 
HLT = Halt 1111 0100 4 
NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Instruction Format per itcache Notes 
Miss 

MOV = Move To and From Control/Debug/Test Registers 

CRO from register 0000 1111 : 0010 0010 : 11 000 reg 17 2 

CR2/CR3 from register 0000 1111 : 0010 0010: 11 eee reg 4 

Reg from CRO-3 0000 1111 : 0010 0000 : 11 eee reg 4 

DRO-3 from register 0000 1111 : 0010 0011 : 11 eee reg 10 

DR6-7 from register 0000 1111 : 0010 0011 : 11 eee reg 10 

Register from DR6-7 0000 1111 : 0010 0001 : 11 eee reg 9 

Register from DRO-3 0000 1111 : 0010 0001 : 11 eee reg 9 

TR3 from register 0000 1111 :0010 0110: 11011 reg 4 

TR4-7 from register 0000 1111 : 0010 0110: 11 eee reg 4 

Register from TR3 0000 1111 : 0010 0100 : 11 011 reg 3 

Register from TR4-7 0000 1111 : 0010 0100 : 11 eee reg 4 
CPUID = CPU Identification 0000 1111 : 1010 0010 

EAX = 1 14 

EAX = 0, >1 9 


CLTS = Clear Task 
Switched Flag 


INVD = Invalidate Data 


Cache 


0000 1111 : 0000 0110 


0000 1111 : 0000 1000 


WBINVD = Write-Back and 0000 1111 : 0000 1001 
Invalidate Data Cache 


INVLPG = Invalidate TLB Entry 


INVLPG memory 


0000 1111 : 0000 0001 : mod 111 r/m 


PREFIX BYTES 


Address Size Prefix 


0110 0111 


LOCK = Bus Lock Prefix 


1111 0000 


Operand Size Prefix 0110 0110 1 

Segment Override Prefix 
CS: 0010 1110 { 
Ds: 0011 1110 { 
ES: 0010 0110 { 
FS: 0110 0100 { 
GS: 0110 0101 { 
SS: 0011 0110 { 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Instruction Format eae itcache Notes 
Miss 

PROTECTION CONTROL 
ARPL = Adjust Requested Privilege Level 

From register 0110 0011 : 11 reg1 reg2 

From memory 0110 0011 : mod reg r/m 
LAR = Load Access Rights 

From register 0000 1111 : 0000 0010: 11 regi reg2 11 

From memory 0000 1111 : 0000 0010 : mod reg r/m 11 
LGDT = Load Global Descriptor 

Table register 0000 1111 : 0000 0001 : mod 010 r/m 12 5 
LIDT = Load Interrupt Descriptor 

Table register 0000 1111 : 0000 0001 : mod 011 r/m 12 5 
LLDT = Load Local Descriptor 

Table register from reg. 0000 1111 : 0000 0000 : 11 010 reg 11 3 

Table register from mem. 0000 1111 : 0000 0000 : mod 010 r/m 11 
LMSW = Load Machine Status Word 

From register 0000 1111 : 0000 0001 : 11 110 reg 13 

From memory 0000 1111 : 0000 0001 : mod 110 r/m 13 1 
LSL = Load Segment Limit 

From register 0000 1111 : 0000 0011 : 11 regi reg2 10 

From memory 0000 1111 : 0000 0011 : mod reg r/m 10 
LTR = Load Task Register 

From register 0000 1111 : 0000 0000 : 11 011 reg 20 

From memory 0000 1111 : 0000 0000 : mod 011 r/m 20 
SGDT = Store Global Descriptor Table 

0000 1111 : 0000 0001 : mod 000 r/m 10 
SIDT = Store Interrupt Descriptor Table 
0000 1111 : 0000 0001 : mod 001 r/m 2 

SLDT = Store Local Descriptor Table 

To register 0000 1111 : 0000 0000 : 11 000 reg 

To memory 0000 1111 : 0000 0001 : mod 000 r/m 
SMSW = Store Machine Status Word 

To register 0000 1111 : 0000 0001 : 11 000 reg 

To memory 0000 1111 : 0000 0001 : mod 100 r/m 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Instruction Format are itcache Notes 
Miss 
STR = Store Task Register 
To register 0000 1111 : 0000 0000 : 11 001 r/m 2 
To memory 0000 1111 : 0000 0000 : mod 001 r/m 
VERR = Verify Read Access 
Register 0000 1111 : 0000 0000 : 11 100 r/m 11 
Memory 0000 1111 : 0000 0000 : mod 100 r/m 11 


VERW = Verify Write Access 
To register 0000 1111 : 0000 0000 : 11 101 r/m 
To memory 0000 1111 : 0000 0000 : mod 101 r/m 
INTERRUPT INSTRUCTIONS 
INTn = Interrupt Type n 1100 1101 : type RV/P, 21 


INT3 = Interrupt Type 3 1100 1100 
INTO = Interrupt 4 if Overflow Flag Set 
1100 1110 


Taken 
Not Taken 


NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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INSTRUCTION SET SUMMARY 


Penalty 
Instruction Format sri if Cache Notes 
Miss 
BOUND = Interrupt 5 if Detect Value Out Range 
0110 0010 : mod reg r/m 
If in range 7 7 21 
If out of range INT+ 7 21 
24 
IRET = Interrupt Return 1100 1111 
Real Mode/Virtual Mode 15 8 
Protected Mode 
To same level 20 11 9 
To outer level 36 19 9 
To nested task TS+32 4 9,10 
(EFLAGS.NT=1) 
RSM = Exit System Management Mode 
0000 1111 : 1010 1010 
SMBASE Relocation 452 
Auto HALT Restart 456 
/O Trap Restart 465 
External Interrupt INT+ 21 
11 
NMI = Non-Maskable Interrupt INT+3 21 
Page Fault INT+ 21 
24 
VM86 Exceptions 
CLK INT+8 21 
STI INT+8 21 
INTn INT+9 
PUSHF INT+9 21 
POPF INT+8 21 
IRET INT+9 
IN 
Fixed Port INT+50 21 
Variable Port INT+51 21 
OUT 
Fixed Port INT+50 21 
Variable Port INT+51 21 
INS INT+50 21 
OUTS INT+50 21 
REP INS INT+51 21 
REPOUTS INT+51 21 
NOTE: See Table 12-18 for notes and abbreviations for items in this table. 
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Table 12-16. Task Switch Clock Counts 


Value for TS 
Method 
Cache Hit Miss Penalty 
VM/Intel486™ processor/286 TSS to Intel486 processor TSS 162 55 
VM/Intel486 processor/286 TSS to 286 TSS 144 31 
NOTE: See Table 12-18 for definitions and notes for items in this table. 
Table 12-17. Interrupt Clock Counts 
Value for INT 
Method 
Cache Hit | Miss Penalty Notes 
Real Mode 26 2 
Protected Mode 
Interrupt/Trap gate, same level 44 6 9 
Interrupt/Trap gate, different level 71 17 9 
Task Gate 37+TS 3 9,10 
Virtual Mode 
Interrupt/Trap gate, different level 82 17 
Task Gate 37+TS 3 10 


NOTE: See Table 12-18 for definitions and notes for items in this table. 


Table 12-18. Notes and Abbreviations (for Tables 12-15 through 12-17) (Sheet 1 of 2) 


The following abbreviations are used in Tables 12-15 through 12-17: 


Abbreviation 


16/32 
U/L 
MN/MX 
L/NL 
RV/P 


Definition 

16/32 bit modes 

unlocked/locked 

minimum/maximum 

loop/no loop 

real and virtual mode/protected mode 
real mode 

protected mode 

taken/not taken 

hit/no hit 
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The following notes refer to Tables 12-15 through 12-17 


1. Assuming that the operand address and stack address fall in different cache sets. 
2. Always locked, no cache hit case. 
3. Clocks= 10 + max(loga(|m|),n) 


4. Clocks = {quotient(count/operand length)}*7+9 
= 8 if count < operand length (8/16/32) 
5. Clocks = {quotient(count/operand length)}*7+9 


= 9 if count < operand length (8/16/32) 
6. Equal/not equal cases (penalty is the same regardless of lock) 


7. Assuming that addresses for memory read (for indirection), stack puch/pop and branch fall in different 
cache sets. 


8. Penalty for cache miss: add 6 clocks for every 16 bytes copied to new stack frame. 
9. Add 11 clocks for each unaccessed descriptor load. 

10. Refer to task switch clock counts table for value of TS. 

11. Add 4 extra clocks to the cache miss penalty for each 16 bytes. 


For notes 12-13:b=0-3, non-zero byte number); (i=0-1, non-zero nibble number); (n=0-3, non-bit 
number in nibble); 


12.Clocks =8+ 4 (b+1) + 3(i+1) + 3(n+1) 
= 6 if second operand = 0 
13. Clocks =9+4 (b+1) + 3(i+1) + 3(n+1) 
= 7 if second operand = 0 
For notes 14-15:(n=bit position 0-31) 
14. Clocks = 7 + 3(32-n) 
= 6 if second operand = 0 
15. Clocks = 8 + 3(32-n) 
= 7 if second operand = 0 
16. Assuming that the two string addresses fall in different cache sets. 
17. Cache miss penalty: add 6 clocks for every 16 bytes compared. Entire penalty on first compare. 
18. Cache miss penalty: add 2 clocks for every 16 bytes of data. Entire penalty on first load. 


19. Cache miss penalty: add 4 clocks for every 16 bytes moved (1 clock for the first operation and 3 for the 
second). 


20. Cache miss penalty: add 4 clocks for every 16 bytes scanned (2 clocks each for first and second 
operations). 


21. Refer to interrupt clock counts table for value of INT. 

22. Clock count includes one clock for using both displacement and immediate. 
23. Refer to assumption 6 in the case of a cache miss. 

24. Virtual Mode Extensions are disabled. 

25. Protected Virtual Interrupts are disabled. 
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Table 12-19. I/O Instructions Clock Count Summary 


Instruction 


Protected 
Mode 
(CPL<IOPL) 


Protected 
Mode 
(CPL>IOPL) 


Virtual 
86 
Mode 


IN = Input from: 


Fixed Port 1110 010w : port number 
Variable Port 1110 110w 


29 
28 


OUT = Output to: 
Fixed Port 1110 011w : port number 
Variable Port 1110 110w 


31 
30 


INS = Input Byte/Word from DX Port 
0110 110w 


32 


OUTS = Output Byte/Word to DX Port 
0110 111w 


32 


REP INS = Input String 
1111 0010 : 0110 110w 


REP OUTS = Output String 
1111 0010: 0110 111w 


NOTES: 
1. Two clock cache miss penalty in all cases. 
2. c=count in CX or ECX. 


3. Cache miss penalty in all modes: Add two clocks for every 16 bytes. Entire penalty on second operation. 
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Table 12-20. Floating-Point Clock Count Summary (Sheet 1 of 12) 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower | Notes 
Upper Miss Range- 
Range) Upper 
Range) 
DATA TRANSFER 
FLD = Real Load to ST(0) 
32-bit memory 11011 001 : mod 000 r/m : s-i-b/disp. 3 
64-bit memory 11011 101 : mod 000 r/m: s-i-b/disp. 3 
80-bit memory 11011011: mod 101 r/m: s-i-b/disp. 6 
ST(i) 11011 001: 11000 ST(i) 4 
FILD = Integer Load to ST(0) 
16-bit memory 11011 111 : mod 000 r/m : s-i-b/disp. 14.5(13-16) 4 
32-bit memory 11011011 : mod 000 r/m: s-i-b/disp. 11.5(9-12) 4(2-4) 
64-bit memory 11011 111 : mod 101 r/m: s-i-b/disp. 16.8(10-18) 7.8(2-8) 
FBLD = BCD Load to ST(0) 
11011 111 : mod 100 r/m: s-i-b/disp. 75(70-103) 4 7.7(2-8) 
FST = Store Real from ST(0) 
32-bit memory 11011011 : mod 010 r/m: s-i-b/disp. 
64-bit memory 11011 101 : mod 010 r/m: s-i-b/disp. 2 
ST(i) 11011 101: 11001 ST(i) 
FSTP = Store Real from ST(0) and Pop 
32-bit memory 11011011: mod 011 r/m: s-i-b/disp. 1 
64-bit memory 11011 101 : mod 011 r/m: s-i-b/disp. 2 
80-bit memory 11011011 : mod 111 r/m: s-i-b/disp. 
ST(i) 11011 101: 11001 ST(i) 
FIST = Store Integer from ST(0) 
16-bit memory 11011111: mod 010 r/m: s-i-b/disp. 33.4(29-34) 


NOTES: 

1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 
3. 


If CW.PC indicates 24 bit precision then subtract 38 clocks. 


If CW.PC indicates 53 bit precision then subtract 11 clocks. 


NOS 


If there is a numeric error pending from a previous instruction, add 17 clocks. 
If there is a numeric error pending from a previous instruction, add 18 clocks. 
The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
If ABS(operand) is greater than 2/4 then add n clocks, where n=(operand/(7/4)). 
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Table 12-20. Floating-Point Clock Count Summary (Sheet 2 of 12) 


tel. 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower | Notes 
Upper Miss Range- 
Range) Upper 
Range) 
32-bit memory 11011 011 : mod 010 r/m: s-i-b/disp. 32.4(28-34) 
FISTP = Store Integer from ST(0) and Pop 
16-bit memory 11011 111 : mod 011 r/m: s-i-b/disp. 33.4(29-34) 
32-bit memory 11011 011 : mod 011 r/m: s-i-b/disp. 33.4(29-34) 
64-bit memory 11011 111 : mod 111 r/m: s-i-b/disp. 33.4(29-34) 
FBSTP = Store BCD from ST(0) and Pop 
11011 111 : mod 110 r/m: s-i-b/disp. 175(172-176) 
FXCH = Exchange ST(0) and ST(i) 
11011 001 : 11001 ST(i) 4 
COMPARISON INSTRUCTIONS 
FCOM = Compare ST(0) with Real 
32-bit memory 11011 000 : mod 010 r/m: s-i-b/disp. 2 1 
64-bit memory 11011 100 : mod 010 r/m: s-i-b/disp. 3 1 
ST(i) 11011 000 : 11010 ST(i) 
FCOMP = Compare ST(0) with Real and Pop 
32-bit memory 11011 000 : mod 011 r/m: s-i-b/disp. 1 
64-bit memory 11011 100 : mod 011 r/m: s-i-b/disp. 1 
ST(i) 11011 000 : 11011 ST(i) 1 
FCOMPP = Compare ST(0) with ST(1) and Pop Twice 
11011 110: 1101 1001 5 1 
FICOM = Compare ST(0) with Integer 
16-bit memory 11011 110: mod 010 r/m: s-i-b/disp. 18(16-20) 1 
32-bit memory 11011 010 : mod 010 r/m: s-i-b/disp. 16.5(15-17) 1 
FICOMP = Compare ST(0) with Integer 
16-bit memory 11011 110: mod 011 r/m: s-i-b/disp. 18(16-20) 2 1 


NOTES: 
1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 


3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 
If CW.PC indicates 53 bit precision then subtract 11 clocks. 


4. lf there is a numeric error pending from a previous instruction, add 17 clocks. 

5. If there is a numeric error pending from a previous instruction, add 18 clocks. 

6. The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
7. lf ABS(operand) is greater than 7/4 then add n clocks, where n=(operand/(7/4)). 
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Table 12-20. Floating-Point Clock Count Summary (Sheet 3 of 12) 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
32-bit memory 11011010: mod 011 r/m: s-i-b/disp. 16.5(15-17) 2 1 
FTST = Compare ST(0) with 0.0 
11011 011: 1110 0100 4 1 
FUCOM = Unordered compare ST(0) with ST(i) 
11011 101 : 11100 ST(i) 4 1 
FUCOMP = Unordered compare ST(0) with ST(i) and Pop 
11011 101 : 11101 ST(i) 4 1 
FUCOMPP = Unordered compare ST(0) with ST(1) and 
Pop Twice 
11011 101 : 11101 1001 5 1 
FXAM = Examine ST(0) 
11011 001 : 1110 0101 8 
CONSTANTS 


FLDZ = Load +0.0 Into ST(0) 


11011 001 :11101110: 
FLD1 = Load +1.0 Into ST(0) 

11011 001 : 1110 1000: 
FLDP1 = Load p Into ST(0) 

11011 001 :1110 1011: 8 2 
FLDL2T = Load log2(10) Into ST(0) 

11011 001: 1110 1001: 8 2 
FLDL2E = Load log2(e) Into ST(0) 

11011 001 :1110 1010: 8 2 
FLDLG2 = Load log10(2) Into ST(0) 

11011 001 :1110 1100: 8 2 
NOTES: 
1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 
3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 


If CW.PC indicates 53 bit precision then subtract 11 clocks. 


NOs 


If there is a numeric error pending from a previous instruction, add 17 clocks. 
If there is a numeric error pending from a previous instruction, add 18 clocks. 
The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
If ABS(operand) is greater than 2/4 then add n clocks, where n=(operand/(7/4)). 
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Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
FLDLN2 = Load loge(2) Into ST(0) 
11011 001 :11101101: 8 2 
ARITHMETIC 
FADD = Add Real with ST(0) 
ST(0)<-ST(0) + 32-bit memory 
11011 000 : mod 000 r/m : s-i-b/disp. 10(8-20) 2 7(5-17) 
ST(0)<-ST(0) + 64-bit memory 
11011 100 : mod 000 r/m : s-i-b/disp. 10(8-20) 3 7(5-17) 
ST(d)<-ST(0) + ST(i) 
11011 d00 : 11000 ST(i) 10(8-20) 7(5-17) 
FADDP = Add real with ST(0) and Pop (ST(i)< ST(0) 
+ST(i)) 
11011 110 : 11000 ST(i) : 10(8-20) 7(5-17) 


NOTES: 
1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 


3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 
If CW.PC indicates 53 bit precision then subtract 11 clocks. 


4. lf there is a numeric error pending from a previous instruction, add 17 clocks. 

5. If there is a numeric error pending from a previous instruction, add 18 clocks. 

6. The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
7. lf ABS(operand) is greater than 7/4 then add n clocks, where n=(operand/(7/4)). 
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Table 12-20. Floating-Point Clock Count Summary (Sheet 5 of 12) 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
ARITHMETIC (Cont'd.) 
FSUB = Subtract Real from ST(0) 
ST(0)<ST(0) — 32-bit memory 
11011 000 : mod 100 r/m: s-i-b/disp. 10(8-20) 2 7(5-17) 
ST(0)<-ST(0) — 64-bit memory 
11011 100 : mod 100 r/m: s-i-b/disp. 10(8-20) 3 7(5-17) 
ST(d)<-ST(0) — ST(i) 
11011 dOO : 11001 ST(i) 10(8-20) 7(5-17) 
FSUBP = Subtract real from ST(0) and Pop (ST(i)< ST(0) 
-ST(i)) 
11011 110: 11001 ST(i) 10(8-20) 7(5-17) 
FSUBR = Subtract Real reversed (Subtract ST(0) from 
Real) 
ST(0)<-32-bit memory — ST(0) 
11011 000 : mod 101 r/m: s-i-b/disp. 10(8-20) 2 7(5-17) 
ST(0)<-64-bit memory — ST(0) 
11011 100 : mod 101 r/m: s-i-b/disp. 10(8-20) 3 7(5-17) 
ST(d)<-ST(i) — ST(0) 
11011 dOO : 11001 ST(i) 10(8-20) 7(5-17) 
FSUBRP = Subtract Real reversed and Pop (ST(i)< ST(i) 
-ST(0)) 
11011 110 : 11100 ST(i) 10(8-20) 7(5-17) 
FMUL = Multiply Real with ST(0) 
ST(0)<-ST(0) X 32-bit memory 
11011 000 : mod 001 r/m: s-i-b/disp. 11 2 8 
ST(0)<-ST(0) X 64-bit memory 
NOTES: 
1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 
3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 


If CW.PC indicates 53 bit precision then subtract 11 clocks. 
If there is a numeric error pending from a previous instruction, add 17 clocks. 


The INT pin is polled several times while this function is executing to ensure short interrupt latency. 


4. 
5. If there is a numeric error pending from a previous instruction, add 18 clocks. 
6 
7 


If ABS(operand) is greater than 2/4 then add n clocks, where n=(operand/(7/4)). 
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Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
11011 100 : mod 001 r/m: s-i-b/disp. 14 3 11 
ST(d)<-ST(0) X ST(i) 
11011 d0O : 11001 ST(i) 16 13 
FMULP = Multiply ST(0) with ST(i) and Pop (ST(i)<— ST(0) 
XST(i)) 
11011 110 : 11001 ST(i) 16 13 
FDIV = Divide ST(0) by Real 
ST(0)<-ST(0)/ 32-bit memory 
11011 000 : mod 110 r/m: s-i-b/disp. 73 2 70 3 
ST(0)<-ST(0)/ 64-bit memory 
11011 100 : mod 110 r/m: s-i-b/disp. 73 3 70 3 
ST(d)<-ST(0)/ ST(i) 
11011 dOO : 11111 ST(i) 73 70 3 
FDIVP = Divide ST(0) by ST(i) and Pop (ST(i)< ST(0)/ 
ST(i)) 
11011 110 : 11111 ST(i) 73 70 3 
FDIVR = Divide real reversed (Real/ST(0)) 
ST(0)< 32-bit memory/ ST(0) 
11011 000 : mod 111 r/m: s-i-b/disp. 73 2 70 3 
ST(0)< 64-bit memory/ ST(0) 
11011 100 : mod 111 r/m: s-i-b/disp. 73 3 70 3 
ST(d)<— ST(i)/ ST(0) 
11011 dOO : 11110 ST(i) 73 70 3 
FDIVRP = Divide real reversed and Pop (ST(i)<— ST(i)/ 
ST(0)) 
11011 110 : 11110 ST(i) 73 70 3 


NOTES: 
1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 


3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 
If CW.PC indicates 53 bit precision then subtract 11 clocks. 


4. lf there is a numeric error pending from a previous instruction, add 17 clocks. 

5. If there is a numeric error pending from a previous instruction, add 18 clocks. 

6. The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
7. lf ABS(operand) is greater than 7/4 then add n clocks, where n=(operand/(7/4)). 
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Table 12-20. Floating-Point Clock Count Summary (Sheet 7 of 12) 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
FIADD = Add Integer to ST(0) 
ST(0)<-ST(0) + 16-bit memory 
11011 110 : mod 000 r/m: s-i-b/disp. 24(20-35) 2 7(5-17) 
ST(0)<ST(0) + 32-bit memory 
11011 010 : mod 000 r/m: s-i-b/disp. 22.5(19-32) 2 7(5-17) 
FISUB = Subtract Integer from ST(0) 
ST(0)<-ST(0) — 16-bit memory 
11011 110 : mod 100 r/m: s-i-b/disp. 24(20-35) 2 7(5-17) 
ST(0)<ST(0) — 32-bit memory 
11011 010 : mod 100 r/m: s-i-b/disp. 22.5(19-32) 2 7(5-17) 
FISUBR = Integer Subtract Reversed 
ST(0)<-16-bit memory-ST(0) 
11011 110 : mod 101 r/m: s-i-b/disp. 24(20-35) 2 7(5-17) 
ST(0)<-32-bit memory-ST(0) 
11011 010 : mod 101 r/m: s-i-b/disp. 22.5(19-32) 2 7(5-17) 


FIMUL = Multiply Integer with ST(0) 
ST(0)<-ST(0) X 16-bit memory 


11011 110 : mod 101 r/m: s-i-b/disp. 25(23-27) 2 8 
ST(0)<ST(0) X 32-bit memory 

11011 010 : mod 001 r/m : s-i-b/disp. 23.5(19-32) 2 8 
FIDIV = Integer Divide 
ST(0)<-ST(0)/ 16-bit memory 

11011 110 : mod 110 r/m: s-i-b/disp. 87(85-89) 2 70 3 
ST(0)<-ST(0)/ 32-bit memory 

11011 010 : mod 110 r/m: s-i-b/disp. 85.5(84-86) 2 70 3 
NOTES: 
1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 
3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 

If CW.PC indicates 53 bit precision then subtract 11 clocks. 
4. If there is a numeric error pending from a previous instruction, add 17 clocks. 
5. If there is a numeric error pending from a previous instruction, add 18 clocks. 
6. The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
7. lf ABS(operand) is greater than 2/4 then add n clocks, where n=(operand/(7/4)). 
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Table 12-20. Floating-Point Clock Count Summary (Sheet 8 of 12) 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
FIDVR = Integer Divide Reversed 
ST(0)<-16-bit memory/ST(0) 
11011 110 : mod 111 r/m: s-i-b/disp. 87(85-89) 2 70 3 
ST(0)<-32-bit memory/ST(0) 
11011 010 : mod 111 r/m: s-i-b/disp. 85.5(84-86) 2 70 3 
FSQRT = Square Root 
11011 001 : 1111 1010 85.5(83-87) 70 
FSCALE = Scale ST(0) by ST(1) 
11011 001 : 1111 1101 31 (30-32) 2 
FXTRACT = Extract Components of ST(0) 
11011 001 : 1111 0100 19(16-20) 4(2-4) 
FPREM = Partial Reminder 
11011 001 : 1111 1000 84(70-138) 2(2-8) 
FPREM1 = Partial Reminder (IEEE) 
11011 001 : 1111 0101 94.5(72-167) 5.5(2-18) 


NOTES: 


1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 
3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 

If CW.PC indicates 53 bit precision then subtract 11 clocks. 


NOR OURS 
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If there is a numeric error pending from a previous instruction, add 17 clocks. 
If there is a numeric error pending from a previous instruction, add 18 clocks. 
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INSTRUCTION SET SUMMARY 


Table 12-20. Floating-Point Clock Count Summary (Sheet 9 of 12) 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
FRNDINT = Round ST(0) to Integer 
11011 001 : 1111 1100 29.1(21-30) 7.4(2-8) 
FABS = Absolute value of ST(0) 
11011 001 : 1110 0001 
FCHS = Change Sign of ST(0) 
11011 001 : 1110 0000 
TRANSCENDENTAL 
FCOS = Cosine of ST(0) 
11011 001 : 11111111 241 (193-279) 2 6,7 
FPTAN = Partial Tangent of ST(0) 
11011 001 : 1111 0010 244(200-273) 70 6,7 
FPATAN = Partial Arctangent 
11011 001 : 1111 0011 289(218-303) 5(2-17) 6 
FSIN = Sine of ST(0) 
11011 001 : 11111110 241 (193-279) 2 6,7 
FSINCOS = Sine and Cosine of ST(0) 
11011 001 : 1111 1011 291 (243-329) 2 6,7 
F2XM1 = 2ST(0)-1 
11011 001 : 1111 0000 242(140-279) 2 6 
FYL2X = ST(1) x log2(ST(0)) 
11011 001 : 1111 0001 311(196-329) 13 6 
FYL2XP1 = ST(1) x log2(ST(0) + 1.0) 
11011 001 : 1111 1001 313(171-326) 13 6 
PROCESSOR CONTROL 
[FinT=InitiaizeFPU ti‘ tC dd 
NOTES: 


1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 
3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 
If CW.PC indicates 53 bit precision then subtract 11 clocks. 
If there is a numeric error pending from a previous instruction, add 17 clocks. 


The INT pin is polled several times while this function is executing to ensure short interrupt latency. 


4. 
5. If there is a numeric error pending from a previous instruction, add 18 clocks. 
6 
7 


If ABS(operand) is greater than 2/4 then add n clocks, where n=(operand/(7/4)). 
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Table 12-20. Floating-Point Clock Count Summary (Sheet 10 of 12) 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
11011 001 : 11100011 17 4 
FSTSW AX = Store status word into AX 
11011 111: 1110 0000 3 5 
FSTSW = Store status word into memory 
11011 101 : mod 111 r/m: s-i-b/disp. 3 5 


NOTES: 

1. If operand 
2. If operand 
3 


is O clock counts = 27. 
is 0 clock counts = 28. 


If CW.PC indicates 24 bit precision then subtract 38 clocks. 
If CW.PC indicates 53 bit precision then subtract 11 clocks. 


Or 
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If there is a numeric error pending from a previous instruction, add 17 clocks. 
If there is a numeric error pending from a previous instruction, add 18 clocks. 
The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
If ABS(operand) is greater than 7/4 then add n clocks, where n=(operand/(7/4)). 
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Table 12-20. Floating-Point Clock Count Summary (Sheet 11 of 12) 


Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
FLDCW = Load control word 
11011 001 : mod 101 r/m: s-i-b/disp. 4 2 
FSTCW = Store control word 
11011 001 : mod 111 r/m: s-i-b/disp. 3 5 
FCLEX = Clear exceptions 
11011 011 : 1110 0010 4 
FSTENV = Store environment 
11011 011 : mod 110 r/m: s-i-b/disp. 
Real and Virtual Modes 16-bit address 67 4 
Real and Virtual Modes 32-bit address 67 4 
Protected Mode 16-bit address 56 4 
Protected Mode 32-bit address 56 4 
FLDENV = Load Environment 
11011 011 : mod 100 r/m: s-i-b/disp. 
Real and Virtual Modes 16-bit address 44 2 
Real and Virtual Modes 32-bit address 44 2 
Protected Mode 16-bit address 34 2 
Protected Mode 32-bit address 34 2 
FSAVE = Save State 
11011 101 : mod 110 r/m: s-i-b/disp. 
Real and Virtual Modes 16-bit address 154 4 
Real and Virtual Modes 32-bit address 154 4 
Protected Mode 16-bit address 143 4 
Protected Mode 32-bit address 143 4 
FRSTOR = Restore State 
11011 101 : mod 100 r/m: s-i-b/disp. 


NOTES: 

1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 
3. 


If CW.PC indicates 24 bit precision then subtract 38 clocks. 


If CW.PC indicates 53 bit precision then subtract 11 clocks. 


NO. 


If there is a numeric error pending from a previous instruction, add 17 clocks. 
If there is a numeric error pending from a previous instruction, add 18 clocks. 
The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
If ABS(operand) is greater than 7/4 then add n clocks, where n=(operand/(1/4)). 
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Concurrent 
Cache Hit Execution 
Avg (Lower | Penalty Avg 
Instruction Format Range... if Cache (Lower Notes 
Upper Miss Range- 
Range) Upper 
Range) 
Real and Virtual Modes 16-bit address 131 23 
Real and Virtual Modes 32-bit address 131 27 
Protected Mode 16-bit address 120 23 
Protected Mode 32-bit address 120 27 
FINCSTP = Increment Stack Pointer 
11011 001 : 11110111 3 
FDECSTP = Decrement Stack Pointer 
11011 001 : 11110110 3 
FFREE = Free ST(i) 
11011 101 : 11000 ST(i) 3 
FNOP = No Operations 
11011 101 : 1101 0000 3 


WAIT = Wait until FPU ready (min/max) 


NOTES: 
1. If operand is 0 clock counts = 27. 
2. If operand is 0 clock counts = 28. 


3. If CW.PC indicates 24 bit precision then subtract 38 clocks. 
If CW.PC indicates 53 bit precision then subtract 11 clocks. 


NOs 
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If there is a numeric error pending from a previous instruction, add 17 clocks. 
If there is a numeric error pending from a previous instruction, add 18 clocks. 
The INT pin is polled several times while this function is executing to ensure short interrupt latency. 
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APPENDIX A 
SIGNAL DESCRIPTIONS 


For pin diagrams and pin locations, refer to the individual processor datasheets. 


Table A-1. Embedded Intel486™ Processor Pin Descriptions (Sheet 1 of 8) 


Symbol | Type Name and Function 


CLK | | Clock provides the fundamental timing and the internal operating frequency for the 
Intel486 processor. All external timing parameters are specified with respect to the 
rising edge of CLK. 


ADDRESS BUS 


A[31:4], V/O |The Address Lines A[31 :2], together with the byte enables signals BE[3:0]#, define the 
A[3:2] OQ | physical area of memory or input/output space accessed. Address lines A[31:4] are 
used to drive addresses to the processor to perform cache line invalidations. Input 
signals must meet setup and hold times tyo and tog. A[31:2] are not driven during bus or 
address hold. 


BE[3:0]# OQ |The Byte Enable signals indicate active bytes during read and write cycles. During the 
first cycle of a cache fill, the external system should assume that all byte enables are 
active. BE3# applies to D[31:24], BE2# applies to D[23:16], BE1# applies to D[15:8] and 
BEO# applies to D[7:0]. BE[3:0]# are active low and are not driven during bus hold. 


DATA BUS 


D[31:0] /O |The Data Lines D[7:0] define the least significant byte of the data bus and lines 
D[31:24] define the most significant byte of the data bus. These signals must meet 
setup and hold times to. and to3 for proper operation on reads. These pins are driven 
during the second and subsequent clocks of write cycles. 


DATA PARITY 


DP[3:0] /O | One Data Parity pin exists for each byte of the data bus. Data parity is generated on all 
write data cycles with the same timing as the data driven by the Intel486 processor. 
Even parity information must be driven back into the processor on the data parity pins 
with the same timing as read information to ensure that the correct parity check status is 
indicated by the Intel486 processor. The signals read on these pins do not affect 
program execution. 


Input signals must meet setup and hold times too and to3. DP[3:0] should be connected 
to Vcc through a pull-up resistor in systems that do not use parity. DP[3:0] are active 
high and are driven during the second and subsequent clocks of write cycles. 
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Table A-1. Embedded Intel486™ Processor Pin Descriptions (Sheet 2 of 8) 


Symbol | Type Name and Function 

M/lO# OQ | The Memory/Input-Output, Data/Control and Write/Read lines are the primary bus 
D/C# O_ |definition signals. These signals are driven valid as the ADS# signal is asserted. 
ae . M/lO# D/C# W/R# Bus Cycle Initiated 

0 0 0 Interrupt Acknowledge 

0 0 1 Halt/Special Cycle 

0 1 0 1/0 Read 

0 1 1 1/O Write 

1 0 0 Code Read 

1 0 1 Reserved 

1 1 0 Memory Read 

1 1 1 Memory Write 


The bus definition signals are not driven during bus hold and follow the timing of the 
address bus. Refer to section 10.2.11, “Special Bus Cycles,” for a description of the 
special bus cycles. 


LOCK# QO | The Bus Lock pin indicates that the current bus cycle is locked. The Intel486 processor 
does not allow a bus hold when LOCK¢# is asserted (but address holds are allowed). 
LOCK# goes active in the first clock of the first locked bus cycle and goes inactive after 
the last clock of the last locked bus cycle. The last locked cycle ends when ready is 
asserted. LOCK is active low and is not driven during bus hold. Locked read cycles are 
not transformed into cache fill cycles when KEN# is asserted. 


PLOCK# OQ | The Pseudo-Lock pin indicates that the current bus transaction requires more than one 
bus cycle to complete. For the Intel486 processor, examples of such operations are 
segment table descriptor reads (64 bits) and cache line fills (128 bits). For Intel486 
processors with an on-chip Floating-Point Unit, floating-point long reads and writes (64 
bits) also require more than one bus cycle to complete. 


The Intel486 processor asserts PLOCK# until the addresses for the last bus cycle of the 
transaction have been driven, regardless of whether RDY# or BRDY# have been 
asserted. 

Normally PLOCK# and BLAST# are the inverse of each other. However, during the first 
bus cycle of a 64-bit floating-point write (for Intel486 processors with on-chip Floating- 
Point Unit) both PLOCK# and BLAST# are asserted. 

PLOCK¢# is a function of the BS8#, BS16# and KEN# inputs. PLOCK# should be 
sampled only in the clock in which ready is asserted. PLOCK# is active low and is not 
driven during bus hold. 
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Table A-1. Embedded Intel486™ Processor Pin Descriptions (Sheet 3 of 8) 


Symbol | Type | Name and Function 
BUS CONTROL 
ADS# O |The Address Status output indicates that a valid bus cycle definition and address are 


available on the cycle definition lines and address bus. ADS# is driven active in the 
same clock in which the addresses are driven. ADS# is active low and is not driven 
during bus hold. 


RDY# | |The Non-burst Ready input indicates that the current bus cycle is complete. RDY# 
indicates that the external system has presented valid data on the data pins in response 
to a read or that the external system has accepted data from the Intel486 processor in 
response to a write. RDY# is ignored when the bus is idle and at the end of the first 
clock of the bus cycle. 


RDY# is active during address hold. Data can be returned to the processor while 
AHOLD is active. 


RDY# is active low, and is not provided with an internal pull-up resistor. RDY# must 
satisfy setup and hold times tg and t,7 for proper chip operation. 


BURST CONTROL 


BRDY# | |The Burst Ready input performs the same function during a burst cycle that RDY# 
performs during a non-burst cycle. BRDY# indicates that the external system has 
presented valid data in response to a read or that the external system has accepted 
data in response to a write. BRDY# is ignored when the bus is idle and at the end of the 
first clock in a bus cycle. 


BRDY# is sampled in the second and subsequent clocks of a burst cycle. The data 
presented on the data bus is strobed into the processor when BRDY# is sampled 
asserted. When RDY# is asserted simultaneously with BRDY#, BRDY# is ignored and 
the burst cycle is prematurely aborted. 


BRDY# is active low and is provided with a small pull-up resistor. BRDY# must satisfy 
the setup and hold times ty, and t;7. 


BLAST# OQ | The Burst Last signal indicates that the next time BRDY# is asserted, the burst bus 
cycle is complete. BLAST# is active for both burst and non-burst bus cycles. BLAST# is 
active low and is not driven during bus hold. 


INTERRUPTS 


RESET | |The Reset input forces the Intel486 processor to begin execution at a known state. The 
processor cannot begin execution of instructions until at least 1 ms after Voc and CLK 
have reached their proper DC and AC specifications. The RESET pin should remain 
active during this time to ensure proper processor operation. RESET is active high. 
RESET is asynchronous but must meet setup and hold times too and to, for recognition 
in any specific clock. 


INTR | |The Maskable Interrupt indicates that an external interrupt has been generated. When 
the internal interrupt flag is set in EFLAGS, active interrupt processing is initiated. The 
Intel486 processor generates two locked interrupt acknowledge bus cycles in response 
to the INTR pin being asserted. INTR must remain active until the interrupt 
acknowledges have been performed to ensure that the interrupt is recognized. 


INTR is active high and is not provided with an internal pull-down resistor. INTR is 
asynchronous, but must meet setup and hold times tag and to, for recognition in any 
specific clock. 
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Table A-1. Embedded Intel486™ Processor Pin Descriptions (Sheet 4 of 8) 


Symbol 


Type 


Name and Function 


NMI 


The Non-Maskable Interrupt request signal indicates that an external non-maskable 
interrupt has been generated. NMI is rising edge sensitive. NMI must be held low for at 
least four CLK periods before this rising edge. NMI is not provided with an internal pull- 
down resistor. NMI is asynchronous, but must meet setup and hold times tog and to, for 
recognition in any specific clock. 


SRESET 


The Soft Reset pin duplicates all the functionality of the RESET pin with the following 

two exceptions: 

1. The SMBASE register retains its previous value. 

2. When UP# (I) is asserted, SRESET does not have an effect on the host processor. 

For soft resets, SRESET should remain active for at least 15 CLK periods. SRESET is 
active high. SRESET is asynchronous but must meet setup and hold times tog and to, 

for recognition in any specific clock. 


SMI# 


The System Management Interrupt input is used to invoke System Management 
Mode (SMM). SMI# is a falling edge triggered signal that forces the processor into SMM 
at the completion of the current instruction. SMI# is recognized on an instruction 
boundary and at each iteration for repeat string instructions. SMI# does not break 
LOCKed bus cycles and cannot interrupt a currently executing SMM. The processor 
latches the falling edge of one pending SMI# signal while the processor is executing an 
existing SMl#. The nested SMI# is not recognized until after the execution of a Resume 
(RSM) instruction. 


SMIACT# 


STPCLK# 


The System Management Interrupt Active is an active low output, indicating that the 
processor is operating in SMM. It is asserted when the processor begins to execute the 
SMI# state save sequence and remains asserted (low) until the processor executes the 
last state restore cycle out of SMRAM. 


The Stop Clock Request input signal indicates that a request has been made to turn 
off the CLK input. When the processor recognizes a STPCLK#, the processor stops 
execution on the next instruction boundary, unless superseded by a higher priority 
interrupt, empties all internal pipelines and the write buffers, and generates a Stop 
Grant acknowledge bus cycle. STPCLK# is active low and is provided with an internal 
pull-up resistor. 


STPCLK# is an asynchronous signal, but must remain active until the processor issues 
the Stop Grant bus cycle. STPCLK# may be deasserted at any time after the processor 
has issued the Stop Grant bus cycle. 


BUS ARBITRATION 


BREQ 


The Bus Request signal indicates that the Intel486 processor has internally generated 
a bus request. BREQ is generated whether or not the Intel486 processor is driving the 
bus. BREQ is active high and is never floated. 


HOLD 


The Bus Hold request allows another bus master complete control of the processor 
bus. In response to HOLD going active, the Intel486 processor floats most of its output 
and input/output pins. HLDA is asserted after completing the current bus cycle, burst 
cycle or sequence of locked cycles. The Intel486 processor remains in this state until 
HOLD is deasserted. HOLD is active high and is not provided with an internal pull-down 
resistor. HOLD must satisfy setup and hold times t;g and t;9 for proper operation. 


HLDA 


A-4 


Hold Acknowledge goes active in response to a hold request presented on the HOLD 
pin. HLDA indicates that the Intel486 processor has given the bus to another local bus 
master. HLDA is driven active in the same clock in which the Intel486 processor floats 
its bus. HLDA is driven inactive when leaving bus hold. HLDA is active high and 
remains driven during bus hold. 
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Table A-1. Embedded Intel486™ Processor Pin Descriptions (Sheet 5 of 8) 


Symbol 


Type 


Name and Function 


BOFF# 


The Backoff input forces the Intel486 processor to float its bus in the next clock. The 
processor floats all pins normally floated during bus hold but HLDA is not asserted in 
response to BOFF#. BOFF# has higher priority than RDY# or BRDY#; when both are 
asserted in the same clock, BOFF# takes effect. The processor remains in bus hold 
until BOFF# is negated. If a bus cycle was in progress when BOFF# was asserted, the 
cycle is restarted. BOFF# is active low and must meet setup and hold times t;g and tig 
for proper operation. 


CACHE INVALIDATION 


AHOLD 


The Address Hold request allows another bus master access to the processor’s 
address bus for a cache invalidation cycle. The Intel486 processor stops driving its 
address bus in the clock following AHOLD going active. Only the address bus is floated 
during address hold, the remainder of the bus remains active. AHOLD is active high and 
is provided with a small internal pull-down resistor. For proper operation AHOLD must 
meet setup and hold times tyg and tyg. 


EADS# 


This signal indicates that a valid External Address has been driven onto the Intel486 
processor address pins. This address is used to perform an internal cache invalidation 
cycle. EADS# is active low and is provided with an internal pull-up resistor. EADS# must 
satisfy setup and hold times tj. and t;3 for proper operation. 


CACHE CONTROL 


KEN# 


FLUSH# 


The Cache Enable pin is used to determine whether the current cycle is cacheable. 
When the Intel486 processor generates a cycle that can be cached and KEN# is active 
one clock before RDY# or BRDY# during the first transfer of the cycle, the cycle 
becomes a cache line fill cycle. Asserting KEN# one clock before RDY# during the last 
read in the cache line fill causes the line to be placed in the on-chip cache. KEN# is 
active low and is provided with a small internal pull-up resistor. KEN# must satisfy setup 
and hold times ty4 and ty5 for proper operation. 


The Cache Flush input forces the Intel486 processor to flush its entire internal cache. 
FLUSH# is active low and need only be asserted for one clock. FLUSH# is 
asynchronous but setup and hold times tog and tp, must be met for recognition in any 
specific clock. 


PAGE CACHEABILITY 


PWT 
PCD 


[oxe) 


The Page Write-Through and Page Cache Disable pins reflect the state of the page 
attribute bits, PWT and PCD, in the page table entry, page directory entry or control 
register 3 (CR3) when paging is enabled. When paging is disabled, the processor 
ignores the PCD and PWT bits and assumes they are zero for the purpose of caching 
and driving PCD and PWT pins. PWT and PCD have the same timing as the cycle 
definition pins (M/lO#, D/C#, and W/R#). PWT and PCD are active high and are not 
driven during bus hold. PCD is masked by the cache disable bit (CD) in Control 
Register 0. 


BUS SIZE CONTROL 


BS16# 
BS8# 


The Bus Size 16 and Bus Size 8 pins (bus sizing pins) cause the Intel486 processor to 
run multiple bus cycles to complete a request from devices that cannot provide or 
accept 32 bits of data in a single cycle. The bus sizing pins are sampled every clock. 
The state of these pins in the clock before ready is used by the Intel486 processor to 
determine the bus size. These signals are active low and are provided with internal pull- 
up resistors. These inputs must satisfy setup and hold times t;4 and t;5 for proper 
operation. 


These pins are not present on the Ultra-Low Power Intel486 SX and GX processors. 
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Table A-1. Embedded Intel486™ Processor Pin Descriptions (Sheet 6 of 8) 


Symbol | Type Name and Function 
ADDRESS MASK 
A20M# | |When the Address Bit 20 Mask pin is asserted, the Intel486 processor masks physical 


address bit 20 (A20) before performing a lookup to the internal cache or driving a 
memory cycle on the bus. A2OM# emulates the address wraparound at one Mbyte, 
which occurs on the 8086 processor. A20M# is active low and should be asserted only 
when the processor is in Real Mode. This pin is asynchronous but should meet setup 
and hold times tog and tp; for recognition in any specific clock. For proper operation, 
A20M# should be sampled high at the falling edge of RESET. 


TEST ACCESS PORT 


TCK | | Test Clock is an input to the Intel486 processor and provides the clocking function 
required by the JTAG Boundary scan feature. TCK is used to clock state information 
and data into component on the rising edge of TCK on TMS and TDI, respectively. Data 
is clocked out of the part on the falling edge of TCK and TDO. TCK is provided with an 
internal pull-up resistor. 


TDI | | Test Data Input is the serial input used to shift JTAG instructions and data into 
component. TDI is sampled on the rising edge of TCK, during the SHIFT-IR and SHIFT- 
DR TAP controller states. During all other tap controller states, TDI is a “don't care.” TDI 
is provided with an internal pull-up resistor. 


TDO OQ | Test Data Output is the serial output used to shift JTAG instructions and data out of the 
component. TDO is driven on the falling edge of TCK during the SHIFT-IR and SHIFT- 
DR TAP controller states. At all other times TDO is driven to the high impedance state. 


TMS | | Test Mode Select is decoded by the JTAG TAP (Tap Access Port) to select the 
operation of the test logic. TMS is sampled on the rising edge of TCK. To guarantee 
deterministic behavior of the TAP controller TMS is provided with an internal pull-up 
resistor. 


PERFORMANCE UPGRADE SUPPORT 


Reserved# | | The Reserved input detects the presence of the in-circuit emulator, then powers down 
the core, and three-states all outputs of the original processor, so that the original 
processor consumes very low current. Reserved# is active low and sampled at all 
times, including after power-up and during reset. 


NUMERIC ERROR REPORTING FOR IntelDX2™ AND IntelDX4™ PROCESSORS 


FERR# O_ | The Floating-Point Error pin is driven active when a floating-point error occurs. FERR# 
is similar to the ERROR# pin on the Intel387 Math CoProcessor. FERR# is included for 
compatibility with systems using DOS type floating-point error reporting. FERR# does 
not go active when FP errors are masked in FPU register. FERR# is active low, and is 
not floated during bus hold. 


This pin is not present on the Ultra-Low Power Intel486 SX and GX processors. 


IGNNE# | |When the Ignore Numeric Error pin is asserted the processor ignores a numeric error 
and continue executing non-control floating-point instructions, but FERR# is still 
activated by the processor. When IGNNE# is deasserted, the processor freezes on a 
non-control floating-point instruction, when a previous floating-point instruction caused 
an error. IGNNE# has no effect when the NE bit in control register 0 is set. IGNNE# is 
active low and is provided with a small internal pull-up resistor. IGNNE# is 
asynchronous but setup and hold times tgp and tp, must be met to insure recognition on 
any specific clock. 


This pin is not present on the Ultra-Low Power Intel486 SX and GX processors. 
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Table A-1. Embedded Intel486™ Processor Pin Descriptions (Sheet 7 of 8) 


Symbol 


Type | Name and Function 


WRITE-BACK ENHANCED IntelDX4™ PROCESSOR SIGNAL PINS 


CACHE# 


The CACHE# output indicates internal cacheability on read cycles and burst write-back 
on write cycles. CACHE# is asserted for cacheable reads, cacheable code fetches and 
write-backs. It is driven inactive for non-cacheable reads, I/O cycles, special cycles, and 
write-through cycles. 


FLUSH# 


Cache Flushi is an existing pin that operates differently when the processor is 
configured as Enhanced Bus mode (write-back). FLUSH# causes the processor to write 
back all modified lines and flush (invalidate) the cache. FLUSH# is asynchronous, but 
must meet setup and hold times tog and tp; for recognition in any specific clock. 


HITM# 


INV 


PLOCK# 


SRESET 


The Hit/Miss to a Modified Line pin is a cache coherency protocol pin that is driven 
only in Enhanced Bus mode. When a snoop cycle is run, HITM# indicates that the 
processor contains the snooped line and that the line has been modified. Assertion of 
HITM# implies that the line is written back in its entirety, unless the processor is already 
in the process of doing a replacement write-back of the same line. 


The Invalidation Request pin is a cache coherency protocol pin that is used only in 
Enhanced Bus mode. It is sampled by the processor on EADS#-driven snoop cycles. It 
is necessary to assert this pin to get the effect of the processor invalidate cycle on write- 
through-only lines. INV also invalidates the write-back lines. However, when the 
snooped line is modified, the line is written back and then invalidated. INV must satisfy 
setup and hold times ty. and t;3 for proper operation. 


In the Enhanced bus mode, Pseudo-Lock Output is always driven inactive. In this 
mode, a 64-bit data read (caused by an FP operand access or a segment descriptor 
read) is treated as a multiple cycle read request, which may be a burst or a non-burst 
access based on whether BRDY# or RDY# is asserted by the system. Because only 
write-back cycles (caused by Snoop write-back or replacement write-back) are write 
burstable, a 64-bit write is driven out as two non-burst bus cycles. BLAST# is asserted 
during both writes. Refer to the Bus Functional Description section 10.3 for details on 
Pseudo-Locked bus cycles. 


For the Write-Back Enhanced Intel486 processors, Soft Reset operates similar to other 
Intel486 processors. On SRESET, the internal SMRAM base register retains its 
previous value, does not flush, write-back or disable the internal cache. Because 
SRESET is treated as an interrupt, it is possible to have a bus cycle while SRESET is 
asserted. SRESET is serviced only on an instruction boundary. SRESET is 
asynchronous but must meet setup and hold times too and to; for recognition in any 
specific clock. 


WB/WT# 


The Write-Back/Write-Through pin enables Enhanced Bus mode (write-back cache). 
It also defines a cached line as write-through or write-back. For cache configuration, 
WB/WT# must be valid during RESET and be active for at least two clocks before and 
two clocks after RESET is deasserted. To define write-back or write-through 
configuration of a line, WB/WT# is sampled in the same clock as the first RDY# or 
BRDY# is asserted during a line fill (allocation) cycle. 
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Table A-1. Embedded Intel486™ Processor Pin Descriptions (Sheet 8 of 8) 


Symbol | Type Name and Function 
IntelDX4™ PROCESSOR CLKMUL, Vc¢-5, AND VOLDET 
CLKMUL | | The Clock Multiplier input, defined during device RESET, defines the ratio of internal 


core frequency to external bus frequency. When sampled low, the core frequency 
operates at twice the external bus frequency (speed doubled mode). When driven high 
or left floating, speed triple mode is selected. CLKMUL has an internal pull-up speed to 
Voc and may be left floating in designs that select speed tripled clock mode. 


This pin is present only on the IntelDX4 processor. 


Vecd | |The 5 V Reference Voltage input is the reference voltage for the 5 V-tolerant I/O 
buffers. This signal should be connected to +5 V +5% for use with 5 V logic. When all 
inputs are from 3 V logic, this pin should be connected to 3.3 V. 


This pin is present only on the IntelDX4 processor. 
VOLDET QO |A Voltage Detect signal allows external system logic to distinguish between a 5 V 
Intel486 processor and the 3.3 V IntelDX4 processor. This signal is active low for a 


3.3 V IntelDX4 processor. This pin is available only on the PGA version of the IntelDX4 
processor. 


This pin is present only on the IntelDX4 processor. 
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APPENDIX B 
TESTABILITY 


Testing for the Intel486™ processor can be divided into two categories: Built-in Self Test (BIST) 
and external testing. The BIST tests the non-random logic, control ROM (CROM), translation 
lookaside buffer (TLB) and on-chip cache memory. External tests can be run on the TLB and the 
on-chip cache. The Intel486 processor also has a test mode in which all outputs are three-stated. 


B.1 BUILT-IN SELF TEST (BIST) 


The BIST is initiated by holding the AHOLD (address hold) high for one CLK after RESET goes 
from high to low, as shown in Figure 9.6. The Intel486 processor does not run bus cycles until 
the BIST is concluded. Note that for the Intel486 processor, the RESET must be active for 15 
clocks with or without BIST enabled for warm resets. SRESET should not be driven active (1.e., 
high) when entering or during BIST. See Table B-1 for approximate clocks and maximum com- 
pletion times for different Intel486 processors. 


The results of BIST is stored in the EAX register. The Intel486 processor has successfully passed 
the BIST if the contents of the EAX register are zero. When the results in EAX are not zero, then 
the BIST has detected a flaw in the Intel486 processor. The Intel486 processor performs reset and 
begins normal operation at the completion of the BIST. 


The non-random logic, control ROM, on-chip cache and translation lookaside buffer (TLB) are 
tested during the BIST. 


The cache portion of the BIST verifies that the cache is functional and that it is possible to read 
and write to the cache. The BIST manipulates test registers TR3, TR4 and TRS while testing the 
cache. These test registers are described in Section B.2, “On-Chip Cache Testing.” 


The cache testing algorithm writes a value to each cache entry, reads the value back, and checks 
that the correct value was read back. The algorithm may be repeated more than once for each of 
the 512 cache entries using different constants. The IntelDX4 processor has 1024 cache entries. 
All other Intel486 processors have 512 cache entries. 


The TLB portion of the BIST verifies that the TLB is functional and that it is possible to read and 
write to the TLB. The BIST manipulates test registers TR6 and TR7 while testing the TLB. TR6 
and TR7 are described in Section B.3.2, “TLB Test Registers TR6 and TR7.” 


B.2. ON-CHIP CACHE TESTING 


The on-chip cache testability hooks are designed to be accessible during the BIST and for assem- 
bly language testing of the cache. 


The Intel486 processor contains a cache fill buffer and a cache read buffer. For testability writes, 
data must be written to the cache fill buffer before it can be written to a location in the cache. Data 
must be read from a cache location into the cache read buffer before the processor can access the 
data. The cache fill and cache read buffer are both 128 bits wide. 
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B.2.1_ Cache Testing Registers TR3, TR4 andTR5 


Figure B-1 shows the three cache testing registers: the Cache Data Test Register (TR3), the Cache 
Status Test Register (TR4) and the Cache Control Test Register (TRS). External access to these 
registers is provided through MOV reg, TREG and MOV TREG, reg instructions. 


Table B-1. Maximum BIST Completion Time 


B-2 


Processor Core Clock Approximate Approximate Time for 
Type Freq. Clocks Completions 
Intel486™ SX 25 MHz 1.05 million 42 milliseconds 
IntelDX2™ 50 MHz 0.6 million 24 milliseconds 
IntelDX4™ 75 MHz 1.6 million 22 milliseconds 

31 0 
TR3 

Data Cache Data 


Test Register 


Oo 


31 11109 8 7 6 5 4 3 2 1 


TR4 
Unused Cache Status 
Test Register 


LRU Bits Valid Bits 


(used only (used only 
during reads) | during reads) 


12 11 4 3 2 1 (0) 
TR5 
Set Select Entry Control | Cache Control 
Select Test Register 


Figure B-1. Cache Test Registers (All Intel486™ Processors Except the IntelDX4™ 
Processor) 
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31 0 


TR3 
Data Cache Data 
Test Register 


TR4 
Cache Status 
Test Register 


LRU Bits Valid Bits 


(used only (used only 
during reads) | during reads) 


12 11 4 


TR5 
Cache Control 
Test Register 


3 2 1 0 
Unused Set Select Enuy Control 
Select 


Figure B-2. IntelDX4™ Processor Cache Test Registers 


Cache Data Test Register: TR3 


The cache fill buffer and the cache read buffer can only be accessed through TR3. Data to be writ- 
ten to the cache fill buffer must first be written to TR3. Data read from the cache read buffer must 
be loaded into TR3. 


TR3 is 32 bits wide while the cache fill and read buffers are 128 bits wide. 32 bits of data must 
be written to TR3 four times to fill the cache fill buffer. 32 bits of data must be read from TR3 
four times to empty the cache read buffer. The entry select bits in TR5 determine which 32 bits 
of data TR3 will access in the buffers. 


Cache Status Test Register: TR4 


TR4 handles tag, LRU and valid bit information during cache tests. TR4 must be loaded with a 
tag and a valid bit before a write to the cache. After a read from a cache entry, TR4 contains the 
tag and valid bit from that entry, and the LRU bits and four valid bits from the accessed set. Note 
that the IntelDX4 processor has one less bit in the TR4 TAG field (see Figure B-1). 
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Cache Control Test Register: TR5 


TRS specifies the testability operation to be performed and the set and entry to be accessed. The 
set select field determines the set to be accessed. Note that the IntelDX4 processor has an 8-bit 
set select field and 256 sets. All other Intel486 processors have a 7-bit set select field and 128 sets 
(see Figure B-1). 


The function of the two entry select bits depends on the state of the control bits. When the fill or 
read buffers are being accessed, the entry select bits point to the 32-bit location in the buffer being 
accessed. When a cache location is specified, the entry select bits point to one of the four entries 
in a set (refer to Table B-2). 


Five testability functions can be performed on the cache. The two control bits in TR5 specify the 
operation to be executed. The five operations are: 


1. Write cache fill buffer 

2. Perform a cache testability write 
3. Perform a cache testability read 
4. Read the cache read buffer 

5. Perform a cache flush 


Table B-2 shows the encoding of the two control bits in TR5 for the cache testability functions. 
Table B-2 also shows the functionality of the entry and set select bits for each control operation. 


The cache tests attempt to use as much of the normal operating circuitry as possible. Therefore, 
when cache tests are being performed, the cache must be disabled (i.e.,the CD and NW bits in 
control register 0 (CRO) must be set to 1 to disable the cache). See Chapter 7, “On-Chip Cache.” 
for more information. 


B.2.2 Cache Testing Registers for the IntelDX4™ Processor 


The cache testing registers for the IntelDX4 processor differ slightly from the other Intel486 pro- 
cessors. TR3 in the IntelDX4 processor is identical to other Intel486 processors. TR4 in the 
IntelD-X4 processor uses bits 31 to 12 for the Tag field, and bit 11 is unused. TRS uses bits 11 to 
4 for the Set Select field. The Test Registers for the IntelDX4 processor are shown in Figure B-2. 


NOTE 
Software written for the Intel486 processor for testing the cache using the Test 
Register produces failures due to the changes in the TAG bits and Set Select 
bits for the IntelDX4 processor. 


Rewrite the code to take into account the 20 TAG bits and 8 Set Select bits to 
address the larger cache. 
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B.2.3 Cache Testability Write 


TESTABILITY 


A testability write to the cache is a two step process. First the cache fill buffer must be loaded 
with 128 bits of data and TR4 loaded with the tag and valid bit. Next the contents of the fill buffer 
are written to a cache location. 


Loading the fill buffer is accomplished by first writing to the entry select bits in TR5 and setting 
the control bits in TRS to 00. The entry select bits identify one of four 32-bit locations in the cache 
fill buffer to put 32 bits of data. Following the write to TR5, TR3 is written with 32 bits of data 
which are immediately placed in the cache fill buffer. Writing to TR3 initiates the write to the 
cache fill buffer. The cache fill buffer is loaded with 128 bits of data by writing to TR5 and TR3 
four times using a different entry select location each time. 


Table B-2. Cache Control Bit Encoding and Effect of Control Bits on 
Entry Select and Set Select Functionality 


Control Bits 


Operation Entry Select Bits Function Set Select Bits 


Enable: Fill Buffer Write 
Read Buffer Read 


Select 32-bit location in 
fill/read buffer 


Perform Cache Write 


Select an entry in set 


Select a set to write to 


Perform Cache Read 


Select an entry in set 


Select a set to read from 


Perform Cache Flush 


TR4 must be loaded with the tag and valid bit (bit 10 in TR4) before the contents of the fill buffer 
are written to a cache location. The IntelDX4 processor has a 20-bit tag in TR4. All other Intel486 
processors use a 21-bit tag in TR4. 


The contents of the cache fill buffer are written to a cache location by writing TR5 with a control 
field of 01 along with the set select and entry select fields. The set select and entry select field 
indicate the location in the cache to be written. The normal cache LRU update circuitry updates 
the internal LRU bits for the selected set. 


Note that a cache testability write can only be done when the cache is disabled for replaces (the 
CD bit is control register 0 is reset to 1). Care must be taken when directly writing to entries in 
the cache. When the entry is set to overlap an area of memory that is being used in external mem- 
ory, that cache entry could inadvertently be used instead of the external memory. This is exactly 
the type of operation that one would desire if the cache were to be used as a high speed RAM. 
Also, a memory reference (or any external bus cycle) should not occur in between the move to 
TR4 and the move to TRS, in order to avoid having the value in TR4 change due to the memory 
reference. 
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B.2.4 Cache Testability Read 


A cache testability read is a two step process. First the contents of the cache location are read into 
the cache read buffer. Next the data is examined by reading it out of the read buffer. 


Reading the contents of a cache location into the cache read buffer is initiated by writing TR5 
with the control bits set to 10 and the desired set select and two-bit entry select. The IntelDX4 
processor has an eight-bit select field. All other Intel486 processors have a seven-bit select field. 
In response to the write to TR5, TR4 is loaded with the 21-bit tag field and the single valid bit 
from the cache entry read. TR4 is also loaded with the three LRU bits and four valid bits corre- 
sponding to the cache set that was accessed. The cache read buffer is filled with the 128-bit value 
which was found in the data array at the specified location. 


The contents of the read buffer are examined by performing four reads of TR3. Before reading 
TR3 the entry select bits in TR5 must loaded to indicate which of the four 32-bit words in the read 
buffer to transfer into TR3 and the control bits in TR5 must be loaded with 00. The register read 
of TR3 initiates the transfer of the 32-bit value from the read buffer to the specified general pur- 
pose register. 


Note that it is very important that the entire 128-bit quantity from the read buffer and also the 
information from TR4 be read before any memory references are allowed to occur. When mem- 
ory operations are allowed to happen, the contents of the read buffer will be corrupted. This is 
because the testability operations use hardware that is used in normal memory accesses for the 
Intel486 processor whether the cache is enabled or not. 


B.2.5 Flush Cache 


The control bits in TR5 must be written with 11 to flush the cache. None of the other bits in TR5 
have any meaning when 11 is written to the control bits. Flushing the cache resets the LRU bits 
and the valid bits to 0, but does not change the cache tag or data arrays. 


When the cache is flushed by writing to TRS the special bus cycle indicating a cache flush to the 
external system is not run (see Section 10.3.11, “Special Bus Cycles’). For normal operation, the 
cache should be flushed with the instruction INVD (Invalidate Data Cache) instruction or the 
WBINVD (Write-back and Invalidate Data Cache) instruction. 


B.2.6 Additional Cache Testing Features for Write-Back Enhanced IntelDX4™ 
Processor 


When in Enhanced Bus (Write-Back) mode, the Write-Back Enhanced IntelDX4™ cache testing 
is a superset of the Standard Bus (Write-Through) mode. The additional cache testing features for 
the Write-Back Enhanced IntelDX4 processor are summarized below. 
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There are two state bits per cache line (VH and VL) instead of one (V). The assignment of VH 
and VL state bits is shown in Table B-3. 


Table B-3. State Bit Assignments for the Write-Back Enhanced IntelDX4™ Processor 


State VH, VL 
M a | 
E 0, 1 
cS) 1,0 
| 0, 0 


The state assignments have been chosen so that VH is identical to the V-state of the IntelDX4 
processor, when the Write-Back Enhanced IntelDX4 is in Standard Bus mode and where only S 
and I states are possible. 


There are no changes to TR3 between the Standard Bus mode and the Enhanced Bus mode. The 
TR4 definition remains the same in Standard Bus mode as in Intel486 processors. The changes 
to TR4 in Enhanced Bus mode are shown in Figure B-3. 


In Enhanced Bus mode, the cache line state bits of all four lines of the set are no longer available, 
which eliminates the possibility of a conflicting definition of state bits for the selected entry. The 
entry’s state bits are moved to positions 0 and 1. 


TRS is also the same in Standard Bus mode as in standard Intel486 processors. A minor change 
to TR5 in Enhanced Bus mode is illustrated in Figure B-4. 


In Enhanced Bus mode, control bit TR5.SLF (bit 13) is added to allow 1,1 of TR5.CTL (bits 1:0) 
to perform two different kinds of cache flushes. When SLF=0, CTL=1,1 performs a single clock 
invalidate of all lines in the cache, which does not write-back M-state lines. When SLF=1, the 
specific line addressed is written back (IF in M-State) and invalidated. The state of SLF is signif- 
icant only when CTL=1,1. 


31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 1413 1211109 8 7 6 5 43 2 1 0 


Standard 
Bus Mode 
TR4 


Enhanced 
Bus Mode 
TR4 


Figure B-3. TR4 Definition for Standard and Enhanced Bus Modes for the 
Write-Back Enhanced IntelDX4™ Processor 
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31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 1413 1211109 8 7 6 5 43 2 1 0 


Standard 
reserved Set Addr ENT] CTL] Bus Mode 
TR5 
Ss Enhanced 
reserved L{' Set Addr ENT] CTL] Bus Mode 
FE TR5 


Figure B-4. TR5 Definition for Standard and Enhanced Bus Modes for the 
Write-Back Enhanced IntelDX4™ Processor 


B.3. TRANSLATION LOOKASIDE BUFFER (TLB) TESTING 


The Intel486 processor TLB testability hooks are similar to those in the Intel386 processor. The 
testability hooks have been enhanced to provide added test features and to include new features 
in the Intel486 processor. The TLB testability hooks are designed to be accessible during the 
BIST and for assembly language testing of the TLB. 


B.3.1 Translation Lookaside Buffer Organization 


The Intel486 processor TLB is 4-way set associative and has space for 32 entries. The TLB is 
logically split into three blocks shown in Figure B-5. 


The data block is physically split into four arrays, each with space for eight entries. An entry in 
the data block is 22 bits wide containing a 20-bit physical address and two bits for the page at- 
tributes. The page attributes are the PCD (page cache disable) bit and the PWT (page write- 
through) bit. Refer to Section 7.6, “Page Cacheability,” for a discussion of the PCD and PWT 
bits. 


The tag block is also split into four arrays, one for each of the data arrays. A tag entry is 21 bits 
wide containing a 17-bit linear address and four protection bits. The protection bits are valid (V), 
user/supervisor (U/S), read/write (R/W) and dirty (D). 


The third block contains eight three bit quantities used in the pseudo least recently used (LRU) 
replacement algorithm. These bits are called the LRU bits. Unlike the on-chip cache, the TLB re- 
places a valid line even when there is an invalid line in a set. 
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Figure B-5. TLB Organization 


B.3.2 TLB Test Registers TR6 and TR7 


The two TLB test registers are shown in Figure B-6. TR6 is the command test register and TR7 
is the data test register. External access to these registers is provided through MOV reg, TREG 
and MOV TREG, reg instructions. 


12.3.1.1 Command Test Register: TR6 


TR6 contains the tag information and control information used in a TLB test. Loading TR6 with 
tag and control information initiates a TLB write or lookup test. 


TR6 contains three bit fields, a 20-bit linear address (bits 31:12), seven bits for the TLB tag pro- 
tection bits (bits 11:5) and one bit (bit 0) to define the type of operation to be performed on the 
TLB. 


The 20-bit linear address forms the tag information used in the TLB access. The lower three bits 
of the linear address select which of the eight sets are accessed. The upper 17 bits of the linear 
address form the tag stored in the tag array. 
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TR7 
TLB Data 
Test Register 


Physical Address 


Replacement Pointer Select ei t Pointer (Writes) 
Hit Indication (Lookup) — Hit Location (Lookup) 


Figure B-6. TLB Test Registers 


The seven TLB tag protection bits are described below. 
V:The valid bit for this TLB entry 
D,D#:The dirty bit for/from the TLB entry 
U,U#:The user/supervisor bit for/from the TLB entry 
W,W#:The read/write bit for/from the TLB entry 
Two bits are used to represent the D, U/S and R/W bits in the TLB tag to permit the option of a 


forced miss or hit during a TLB lookup operation. The forced miss or hit occurs regardless of the 
state of the actual bit in the TLB. The meaning of these pairs of bits is given in Table B-4. 


The operation bit in TR6 determines whether the TLB test operation is a write or a lookup. The 
function of the operation bit is given in Table B-S. 


Table B-4. Meaning of a Pair of TR6 Protection Bits 


TR6 Protection Bit TR6 Protection Bit# Meaning on Meaning on 
(B) (B#) TLB Write Operation TLB Lookup Operation 
0 0 Undefined Miss any TLB TAG Bit B 
0 1 Write 0 to TLB TAG Bit B Match TLB TAG Bit B if 0 
1 0 Write 1 to TLB TAG Bit B Match TLB TAG Bit B if 1 
1 1 Undefined Match any TLB TAG Bit B 
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Table B-5. TR6 Operation Bit Encoding 


TR6 Bit 0 TLB Operation to Be Performed 
0 TLB Write 
1 TLB Lookup 


12.3.1.2 Data Test Register: TR7 


TR7 contains the information stored or read from the data block during a TLB test operation. Be- 
fore a TLB test write, TR7 contains the physical address and the page attribute bits to be stored 
in the entry. After a TLB test lookup hit, TR7 contains the physical address, page attributes, LRU 
bits and entry location from the access. 


TR7 contains a 20-bit physical address (bits 31:12), PLD bit (bit 11), PWT bit (bit 10), and three 
bits for the LRU bits (bits 9:7). The LRU bits in TR7 are only used during a TLB lookup test. The 
functionality of TR7 bit 4 differs for TLB writes and lookups. The encoding of bit 4 is defined in 
Table B-6 and Table B-7. Finally, TR7 contains two bits (bits 3:2) to specify a TLB replacement 
pointer or the location of a TLB hit. 


Table B-6. Encoding of Bit 4 of TR7 on Writes 


TR7 Bit 4 Replacement Pointer Used on TLB Write 
0 Pseudo-LRU Replacement Pointer 
1 Data Test Register Bits 3:2 


A replacement pointer is used during a TLB write. The pointer indicates which of the four entries 
in an accessed set is to be written. The replacement pointer can be specified to be the internal LRU 
bits or bits 3:2 in TR7. The source of the replacement pointer is specified by TR7 bit 4. The en- 
coding of bit 4 during a write is given by Table B-6. 


Note that both testability writes and lookups affect the state of the internal LRU bits regardless 
of the replacement pointer used. All TLB write operations (testability or normal operation) cause 
the written entry to become the most recently used. For example, during a testability write with 
the replacement pointer specified by TR7 bits 3:2, the indicated entry is written and that entry 
becomes the most recently used as specified by the internal LRU bits. 


There are two TLB testing operations: write entries into the TLB, and perform TLB lookups. One 
major enhancement over TLB testing in the Intel386™ processor is that paging need not be dis- 
abled while executing testability writes or lookups. 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


Note that any time one TLB set contains the same linear address in more than one of its entries, 
looking up that linear address gives unpredictable results. Therefore a single linear address 
should not be written to one TLB set more than once. 


Table B-7. Encoding of Bit 4 of TR7 on Lookups 


TR7 Bit 4 Meaning after TLB Lookup Operation 
0 TLB Lookup Resulted in a Miss 
fl TLB Lookup Resulted in a Hit 


B.3.3. TLB Write Test 


To perform a TLB write TR7 must be loaded followed by a TR6 load. The register operations 
must be performed in this order because the TLB operation is triggered by the write to TR6. 


TR7 is loaded with a 20-bit physical address and values for PCD and PWT to be written to the 
data portion of the TLB. In addition, bit 4 of TR7 must be loaded to indicate whether to use TR7 
bits 3-2 or the internal LRU bits as the replacement pointer on the TLB write operation. Note that 
the LRU bits in TR7 are not used in a write test. 


TR6 must be written to initiate the TLB write operation. Bit 0 in TR6 must be reset to zero to 
indicate a TLB write. The 20-bit linear address and the seven page protection bits must also be 
written in TR6 to specify the tag portion of the TLB entry. Note that the three least significant 
bits of the linear address specify which of the eight sets in the data block is loaded with the phys- 
ical address data. Thus only 17 of the linear address bits are stored in the tag array. 


B.3.4 TLB Lookup Test 


To perform a TLB lookup it is only necessary to write the proper tags and control information 
into TR6. Bit 0 in TR6 must be set to | to indicate a TLB lookup. TR6 must be loaded with a 20- 
bit linear address and the seven protection bits. To force misses and matches of the individual pro- 
tection bits on TLB lookups, set the seven protection bits as specified in Table B-4. 


A TLB lookup operation is initiated by the write to TR6. TR7 indicates the result of the lookup 
operation following the write to TR6. The hit/miss indication can be found in TR7 bit 4 (see Table 
B-7). 


TR7 contains the following information if bit 4 indicates that the lookup test resulted in a hit. 
Bits 3:2 specify the set in which the match occurred. The 22 most significant bits in TR7 contain 
the physical address and page attributes contained in the entry. Bits 9:7 contain the LRU bits as- 
sociated with the accessed set. The state of the LRU bits is does not reflect their being updated 
for the current lookup. 


When bit 4 in TR7 indicates that the lookup test resulted in a miss, the remaining bits in TR7 are 
undefined. 


Again it should be noted that a TLB testability lookup operation affects the state of the LRU bits. 
The LRU bits are updated if a hit occurs. The entry which was hit becomes the most recently used. 
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B.4 THREE-STATE OUTPUT TEST MODE 


The Intel486 processor provides the ability to float all its outputs and bidirectional pins, except 
for the VOLDET pin in the IntelDX4 processor. This includes all pins floated during bus hold as 
well as pins which are never floated in normal operation of the chip (HLDA, BREQ, FERR# and 
PCHK#). When the Intel486 processor is in the three-state output test mode external testing can 
be used to test board connections. 


The three-state test mode is invoked if FLUSH# is sampled active at the falling edge of RESET. 
FLUSH# is an asynchronous signal. When driven, FLUSH# should be asserted for two clocks be- 
fore and 2 clocks after RESET is de-asserted. When FLUSH# is driven synchronously, the three- 
state output test mode is initiated by driving FLUSH# so that it is sampled active in the clock prior 
to RESET going low and ensuring that specified setup and hold times are met. The outputs are 
guaranteed to three-state no later than 10 clocks after RESET goes low (see Figure B-4). The 
Intel486 processor remains in the three-state test mode until the next RESET. 


B.5 Intel486™ PROCESSOR BOUNDARY SCAN (JTAG) 


The Intel486 processor provides additional testability features compatible with the IEEE Stan- 
dard Test Access Port and Boundary Scan Architecture (IEEE Std. 1149.1). (Note that the 
Intel486 SX processor in PGA package does not have JTAG capability.) The test logic provided 
allows for testing to insure that components function correctly, that interconnections between 
various components are correct, and that various components interact correctly on the printed cir- 
cuit board. 


The boundary scan test logic consists of a boundary scan register and support logic that are ac- 
cessed through a test access port (TAP). The TAP provides a simple serial interface that makes it 
possible to test all signal traces with only a few probes. 


The TAP can be controlled via a bus master. The bus master can be either automatic test equip- 
ment or a component (PLD) that interfaces to the four-pin test bus. 


B.5.1_ Boundary Scan Architecture 
The boundary scan test logic contains the following elements: 
¢ Test access port (TAP), consisting of input pins TMS, TCK, and TDI; and output pin TDO. 


¢ TAP controller, which interprets the inputs on the test mode select (TMS) line and performs 
the corresponding operation. The operations performed by the TAP include controlling the 
instruction and data registers within the component. 


¢ Instruction register (IR), which accepts instruction codes shifted into the test logic on the 
test data input (TDI) pin. The instruction codes are used to select the specific test operation 
to be performed or the test data register to be accessed. 


¢ Test data registers: The Intel486 processor contains three test data registers: Bypass register 
(BPR), Device Identification register (DID), and Boundary Scan register (BSR). 


| B-13 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


The instruction and test data registers are separate shift-register paths connected in parallel and 
have a common serial data input and a common serial data output connected to the TAP signals, 
TDI and TDO, respectively. 


B.5.2 Data Registers 


The Intel486 processor contains the two required test data registers; bypass register and boundary 
scan register. In addition, they also have a device identification register. 


Each test data register is serially connected to TDI and TDO, with TDI connected to the most sig- 
nificant bit and TDO connected to the least significant bit of the test data register. 


Data is shifted one stage (bit position within the register) on each rising edge of the test clock 
(TCK). In addition the Intel486 processor contains a runbist register to support the RUNBIST 
boundary scan instruction. 


B.5.2.1 Bypass Register 


The Bypass Register is a one-bit shift register that provides the minimal length path between TDI 
and TDO. This path can be selected when no test operation is being performed by the component 
to allow rapid movement of test data to and from other components on the board. While the by- 
pass register is selected data is transferred from TDI to TDO without inversion. 


B.5.2.2 Boundary Scan Register 


The Boundary Scan Register is a single shift register path containing the boundary scan cells that 
are connected to all input and output pins of the Intel486 processor. Figure B-7 shows the logical 
structure of the boundary scan register. While output cells determine the value of the signal driven 
on the corresponding pin, input cells only capture data; they do not affect the normal operation 
of the device. Data is transferred without inversion from TDI to TDO through the boundary scan 
register during scanning. The boundary scan register can be operated by the EXTEST and SAM- 
PLE instructions. The boundary scan register order is described in Section B.5.5, “Boundary Scan 
Register Bits and Bit Orders.” 
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Figure B-7. Logical Structure of Boundary Scan Register 


B.5.2.3 Device Identification Register 


The Device Identification Register contains the manufacturer’s identification code, part number 
code, and version code. Table B-8 lists the codes corresponding to the Intel486 processor. 


B.5.2.4 Runbist Register 


The Runbist Register is a one bit register used to report the results of the Intel486 processor BIST 
when it is initiated by the RUNBIST instruction. This register is loaded with a “1” prior to invok- 
ing the BIST and is then loaded with “0” upon successful completion. 


B.5.3 Instruction Register 


The Instruction Register (IR) allows instructions to be serially shifted into the device. The in- 
struction selects the particular test to be performed, the test data register to be accessed, or both. 
The instruction register is four (4) bits wide. The most significant bit is connected to TDI and the 
least significant bit is connected to TDO. There are no parity bits associated with the Instruction 
register. Upon entering the Capture-IR TAP controller state, the Instruction register is loaded 
with the default instruction “0001,” SAMPLE/PRELOAD. Instructions are shifted into the in- 
struction register on the rising edge of TCK while the TAP controller is in the SHIFT-IR state. 


| B-15 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


B.5.3.1 Boundary Scan Instruction Set 


The Intel486 processor supports all three mandatory boundary scan instructions (BYPASS, 
SAMPLE/PRELOAD, and EXTEST) along with two optional instructions IDCODE and RUN- 
BIST). Table B-9 lists the Intel486 processor boundary scan instruction codes. The instructions 
listed as PRIVATE cause TDO to become enabled in the Shift-DR state and cause “0” to be shift- 
ed out of TDO on the rising edge of TCK. Execution of the PRIVATE instructions does not cause 
hazardous operation of the Intel486 processor. 


¢ EXTEST: The instruction code is “0000.” The EXTEST instruction allows testing of 
circuitry external to the component package, typically board interconnects. It does so by 
driving the values loaded into the Intel486 processor's boundary scan register out on the 
output pins corresponding to each boundary scan cell and capturing the values on Intel486 
processor input pins to be loaded into their corresponding boundary scan register locations. 
I/O pins are selected as input or output, depending on the value loaded into their control 
setting locations in the boundary scan register. Values shifted into input latches in the 
boundary scan register are never used by the internal logic of the Intel486 processor. 


Table B-8. Boundary Scan Component Identification Codes 


Vv Intel 
Processor _& ; , MFG ID 1st Boundary 
Type veer ne Family | Model | jntel=009H | Bit | Scan ID (Hex) 
Intel486%o SX | 000001 0100 00010 00000001001 1 x8282013H 
processor (3.3V) 
Intel486™ SX 1 000001 0100 00010 00000001001 1 x8282013H 
processor 
(3.3V, 2X CLK) 
Intel486™ SX 0 000001 0100 00010 00000001001 1 x0282013H 
processor 
(5V) 
Intel486™ SX 0 000001 0100 00010 00000001001 1 x0282013H 
processor 
(5V, 2X CLK) 
Intel486™ DX 1 000001 0100 00001 00000001001 1 x8281013H 
processor (3.3V) 
Intel486™ DX 1 000001 0100 00001 00000001001 1 x8281013H 
processor 
(3.3V, 2X CLK) 
Intel486™ DX 0 000001 0100 00001 00000001001 1 x0281013H 
processor 
(5V) 
Intel486™ DX 0 000001 0100 00001 00000001001 1 x0281013H 
processor 
(5V, 2X CLK) 


¥Contact Intel for details 
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Table B-8. Boundary Scan Component Identification Codes (Continued) 


Processor MFG ID Boundary 
Type Intel=009H Scan ID (Hex) 


IntelDX2™ 00000001001 x8285013H 
processor (3.3V) 
IntelDX2™ 00000001001 x0285013H 
processor (5V) 


IntelDX4™ 00000001001 x8288013H 
processor (3.3V) 


Write-Back 00000001001 X8289013H 

Enhanced 

IntelDX4™ 
processor (3.3V) 


*Contact Intel for details 


Table B-9. Boundary Scan Instruction Codes 


Instruction Code Instruction Name 
0000 EXTEST 
0001 SAMPLE 
0010 IDCODE 
0011 PRIVATE 
0100 PRIVATE 
0101 PRIVATE 
0110 PRIVATE 
0111 PRIVATE 
1000 RUNBIST 
1001 PRIVATE 
1010 PRIVATE 
1011 PRIVATE 
1100 PRIVATE 
1101 PRIVATE 
1110 PRIVATE 
1111 BYPASS 


NOTE: After using the EXTEST instruction, the 
Intel486 processor must be reset before 
normal (non-boundary scan) use. 
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SAMPLE/PRELOAD: The instruction code is “0001.” The SAMPLE/PRELOAD has two 
functions that it performs. When the TAP controller is in the Capture-DR state, the 
SAMPLE/PRELOAD instruction allows a “snap-shot” of the normal operation of the 
component without interfering with that normal operation. The instruction causes boundary 
scan register cells associated with outputs to sample the value being driven by the Intel486 
processor. It causes the cells associated with inputs to sample the value being driven into 
the Intel486 processor. On both outputs and inputs the sampling occurs on the rising edge of 
TCK. When the TAP controller is in the Update-DR state, the SAMPLE/PRELOAD 
instruction preloads data to the device pins to be driven to the board by executing the 
EXTEST instruction. Data is preloaded to the pins from the boundary scan register on the 
falling edge of TCK. 


IDODE: The instruction code is “0010.” The IDCODE instruction selects the device 
identification register to be connected to TDI and TDO, allowing the device identification 
code to be shifted out of the device on TDO. Note that the device identification register is 
not altered by data being shifted in on TDI. 


BYPASS: The instruction code is “1111.” The BYPASS instruction selects the bypass 
register to be connected to TDI and TDO, effectively bypassing the test logic on the 
Intel486 processor by reducing the shift length of the device to one bit. Note that an open 
circuit fault in the board level test data path causes the bypass register to be selected 
following an instruction scan cycle due to the pull-up resistor on the TDI input. This has 
been done to prevent any unwanted interference with the proper operation of the system 
logic. 


RUNBIST: The instruction code is “1000.” The RUNBIST instruction selects the one (1) 
bit runbist register, loads a value of “1” into the runbist register, and connects it to TDO. It 
also initiates the built-in self test (BIST) feature of the Intel486 processor, which is able to 
detect approximately 60% of the stuck-at faults on the Intel486 processor. The Intel486 
processor ac/dc specifications for Vcc and CLK must be met and RESET must have been 
asserted at least once prior to executing the RUNBIST boundary scan instruction. After 
loading the RUNBIST instruction code in the instruction register, the TAP controller must 
be placed in the Run-Test/Idle state. BIST begins on the first rising edge of TCK after 
entering the Run-Test/Idle state. The TAP controller must remain in the Run-Test/Idle state 
until BIST is completed. It requires 1.2 million clock (CLK) cycles to complete BIST and 
report the result to the runbist register. After completing the 1.2 million clock (CLK) cycles, 
the value in the runbist register should be shifted out on TDO during the Shift-DR state. A 
value of “0” being shifted out on TDO indicates BIST successfully completed. A value of 
“1” indicates a failure occurred. After executing the RUNBIST instruction, the Intel486 
processor must be reset prior to normal operation. 
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B.5.4 Test Access Port (TAP) Controller 


The TAP controller is a synchronous, finite state machine. It controls the sequence of operations 
of the test logic. The TAP controller changes state only in response to the following events: 


1. Arising edge of TCK 
2. Power-up. 


The value of the test mode state (TMS) input signal at a rising edge of TCK controls the sequence 
of the state changes. The state diagram for the TAP controller is shown in Figure B-6. Test de- 
signers must consider the operation of the state machine in order to design the correct sequence 
of values to drive on TMS. 
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Figure B-8. TAP Controller State Diagram 
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B.5.4.1 Test-Logic-Reset State 


In this state, the test logic is disabled so that normal operation of the device can continue unhin- 
dered. This is achieved by initializing the instruction register such that the IDCODE instruction 
is loaded. No matter what the original state of the controller, the controller enters Test-Logic-Re- 
set state when the TMS input is held high (1) for at least five rising edges of TCK. The controller 
remains in this state while TMS is high. The TAP controller is also forced to enter this state at 
power-up. 


B.5.4.2 Run-Test/Idle State 


A controller state between scan operations. Once in this state, the controller remains in this state 
as long as TMS is held low. In devices supporting the RUNBIST instruction, the BIST is per- 
formed during this state and the result is reported in the runbist register. For instruction not caus- 
ing functions to execute during this state, no activity occurs in the test logic. The instruction 
register and all test data registers retain their previous state. When TMS is high and a rising edge 
is applied to TCK, the controller moves to the Select-DR state. 


B.5.4.3 Select-DR-Scan State 


This is a temporary controller state. The test data register selected by the current instruction re- 
tains its previous state. If TMS is held low and a rising edge is applied to TCK when in this state, 
the controller moves into the Capture-DR state, and a scan sequence for the selected test data reg- 
ister is initiated. If TMS is held high and a rising edge is applied to TCK, the controller moves to 
the Select-IR-Scan state. 


The instruction does not change in this state. 


B.5.4.4 Capture-DR State 


In this state, the boundary scan register captures input pin data if the current instruction is EX- 
TEST or SAMPLE/PRELOAD. The other test data registers, which do not have parallel input, 
are not changed. 


The instruction does not change in this state. 


When the TAP controller is in this state and a rising edge is applied to TCK, the controller enters 
the Exit1-DR state if TMS is high or the Shift-DR state if TMS is low. 


B.5.4.5 Shift-DR State 


In this controller state, the test data register connected between TDI and TDO as a result of the 
current instruction shifts data one stage toward its serial output on each rising edge of TCK. 


The instruction does not change in this state. 


When the TAP controller is in this state and a rising edge is applied to TCK, the controller enters 
the Exitl-DR state if TMS is high or remains in the Shift-DR state if TMS is low. 
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B.5.4.6 Exit1-DR State 


This is a temporary state. While in this state, if TMS is held high, a rising edge applied to TCK 
causes the controller to enter the Update-DR state, which terminates the scanning process. If TMS 
is held low and a rising edge is applied to TCK, the controller enters the Pause-DR state. 


The test data register selected by the current instruction retains its previous value during this state. 
The instruction does not change in this state. 


B.5.4.7 Pause-DR State 


The pause state allows the test controller to temporarily halt the shifting of data through the test 
data register in the serial path between TDI and TDO. An example of using this state could be to 
allow a tester to reload its pin memory from disk during application of a long test sequence. 


The test data register selected by the current instruction retains its previous value during this state. 
The instruction does not change in this state. 


The controller remains in this state as long as TMS is low. When TMS goes high and a rising edge 
is applied to TCK, the controller moves to the Exit2-DR state. 


B.5.4.8 Exit2-DR State 


This is a temporary state. While in this state, if TMS is held high, a rising edge applied to TCK 
causes the controller to enter the Update-DR state, which terminates the scanning process. If TMS 
is held low and a rising edge is applied to TCK, the controller enters the Shift-DR state. 


The test data register selected by the current instruction retains its previous value during this state. 
The instruction does not change in this state. 


B.5.4.9 Update-DR State 


The boundary scan register is provided with a latched parallel output to prevent changes at the 
parallel output while data is shifted in response to the EXTEST and SAMPLE/PRELOAD in- 
structions. When the TAP controller is in this state and the boundary scan register is selected, data 
is latched onto the parallel output of this register from the shift-register path on the falling edge 
of TCK. The data held at the latched parallel output does not change other than in this state. 


All test data registers selected by the current instruction retains its previous value during this 
state. The instruction does not change in this state. 


B.5.4.10 Select-IR-Scan State 


This is a temporary controller state. The test data register selected by the current instruction re- 
tains its previous value. If TMS is held low and a rising edge is applied to TCK when in this state, 
the controller moves into the Capture-IR state, and a scan sequence for the instruction register is 
initiated. If TMS is held high and a rising edge is applied to TCK, the controller moves to the 
Test-Logic-Reset state. 


The instruction does not change in this state. 
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B.5.4.11 Capture-IR State 


In this controller state the shift register contained in the instruction register loads the fixed value 
“0001” on the rising edge of TCK. 


The test data register selected by the current instruction retains its previous value during this state. 
The instruction does not change in this state. When the controller is in this state and a rising edge 
is applied to TCK, the controller enters the Exitl-IR state if TMS is held high, or the Shift-IR 
state if TMS is held low. 

B.5.4.12 Shift-IR State 


In this state the shift register contained in the instruction register is connected between TDI and 
TDO and shifts data one stage towards its serial output on each rising edge of TCK. 


The test data register selected by the current instruction retains its previous value during this state. 
The instruction does not change in this state. 


When the controller is in this state and a rising edge is applied to TCK, the controller enters the 
Exitl-IR state if TMS is held high, or remains in the Shift-IR state if TMS is held low. 


B.5.4.13  Exit1-IR State 


This is a temporary state. While in this state, if TMS is held high, a rising edge applied to TCK 
causes the controller to enter the Update-IR state, which terminates the scanning process. If TMS 
is held low and a rising edge is applied to TCK, the controller enters the Pause-IR state. 


The test data register selected by the current instruction retains its previous value during this state. 
The instruction does not change in this state. 


B.5.4.14 Pause-IR State 


The pause state allows the test controller to temporarily halt the shifting of data through the in- 
struction register. 


The test data register selected by the current instruction retains its previous value during this state. 
The instruction does not change in this state. 


The controller remains in this state as long as TMS is low. When TMS goes high and a rising edge 
is applied to TCK, the controller moves to the Exit2-IR state. 


B.5.4.15 Exit2-IR State 


This is a temporary state. While in this state, if TMS is held high, a rising edge applied to TCK 
causes the controller to enter the Update-IR state, which terminates the scanning process. If TMS 
is held low and a rising edge is applied to TCK, the controller enters the Shift-IR state. 


The test data register selected by the current instruction retains its previous value during this state. 
The instruction does not change in this state. 
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B.5.4.16 | Update-IR State 


The instruction shifted into the instruction register is latched onto the parallel output from the 
shift-register path on the falling edge of TCK. Once the new instruction has been latched, it be- 
comes the current instruction. 


Test data registers selected by the new current instruction retain the previous value. 


B.5.5 Boundary Scan Register Bits and Bit Orders 


The boundary scan register contains a cell for each pin, as well as cells for control of I/O and 
three-state pins. 


B.5.5.1 Intel486™ SX Processor Boundary Scan Register Bits 


The following is the bit order of the Intel486 SX processor boundary scan register (from left to 
right and top to bottom. See notes below): 


TDO¢e A2, A3, A4, A5, RESERVED#, A6, A7, A8, A9, A10, All, A12, A13, Al4, A15, A16, 
A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, DPO, DO, D1, 
D2, D3, D4, D5, D6, D7, DP1, D8, D9, D10, D11, D12, D13, D14, D15, DP2, D16, D17, D18, 
D19, D20, D21, D22, D23, DP3, D24, D25, D26, D27, D28, D29, D30, D31, STPCLK#, Re- 
served SMI#, SMIACT#, SRESET, NMI, INTR, FLUSH#, RESET, A20M#, EADS#, PCD, 
PWT, D/C#, M/IO#, BE3#, BE2#, BE1#, BEO#, BREQ, W/R#, HLDA, CLK, Reserved, 
AHOLD, HOLD, KEN#, RDY#, BS8#, BS16#, BOFF#, BRDY#, PCHK#, LOCK#, PLOCK#, 
BLAST#, ADS#, MISCCTL, BUSCTL, ABUSCTL, WRTL TDI 


B.5.5.2 IntelDX2™ Processor Boundary Scan Register Bits 


The following is the bit order of the IntelDX2 processor boundary scan register (from left to right 
and top to bottom. See notes below): 


TDO¢e A2, A3, A4, A5, RESERVED#, A6, A7, A8, A9, A10, All, A12, A13, Al4, A15, A16, 
A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, DPO, DO, D1, 
D2, D3, D4, D5, D6, D7, DP1, D8, D9, D10, D11, D12, D13, D14, D15, DP2, D16, D17, D18, 
D19, D20, D21, D22, D23, DP3, D24, D25, D26, D27, D28, D29, D30, D31, STPCLK#, 
IGNNE#, FERR#, SMI#, SMIACT#, SRESET, NMI, INTR, FLUSH#, RESET, A20M#, 
EADS#, PCD, PWT, D/C#, M/IO#, BE3#, BE2#, BE1#, BEO#, BREQ, W/R#, HLDA, CLK, Re- 
served, AHOLD, HOLD, KEN#, RDY#, BS8#, BS16#, BOFF#, BRDY#, PCHK#, LOCK#, 
PLOCK#, BLAST#, ADS#, MISCCTL, BUSCTL, ABUSCTL, WRTL < TDI 
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B.5.5.3 IntelDX4™ Processor Boundary Scan Register Bits 


The following is the bit order of the IntelDX4 processor boundary scan register (from left to right 
and top to bottom. See notes below): 


TDO€ A2, A3, A4, AS, RESERVED#, A6, A7, A8, A9, A10, All, A1l2, A13, Al4, A15, A16, 
A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, DPO, DO, D1, 
D2, D3, D4, D5, D6, D7, DP1, D8, D9, D10, D11, D12, D13, D14, D15, DP2, D16, D17, D18, 
D19, D20, D21, D22, D23, DP3, D24, D25, D26, D27, D28, D29, D30, D31, STPCLK#, 
IGNNE#, FERR#, SMI#, SMIACT#, SRESET, NMI, INTR, FLUSH#, RESET, A20M#, 
EADS#, PCD, PWT, D/C#, M/IO#, BE3#, BE2#, BE1#, BEO#, BREQ, W/R#, HLDA, CLK, 
AHOLD, HOLD, KEN#, RDY#, CLKMUL, BS8#, BS16#, BOFF#, BRDY#, PCHK#, LOCK#, 
PLOCK#, BLAST#, ADS#, MISCCTL, BUSCTL, ABUSCTL, WRTL < TDI 


B.5.5.4 Write-Back Enhanced IntelDX4™ Processors Boundary Scan Register Bits 


The following is the bit order of the Write-Back Enhanced IntelDX4 processor boundary scan 
register (from left to right and top to bottom. See notes below): 


TDO€ A2, A3, A4, AS, RESERVED#, A6, A7, A8, A9, A10, All, A1l2, A13, Al4, A15, A16, 
A17, A18, A19, A20, A21, A22, A23, A24, A25, A26, A27, A28, A29, A30, A31, DPO, DO, D1, 
D2, D3, D4, D5, D6, D7, DP1, D8, D9, D10, D11, D12, D13, D14, D15, DP2, D16, D17, D18, 
D19, D20, D21, D22, D23, DP3, D24, D25, D26, D27, D28, D29, D30, D31, STPCLK#, 
IGNNE#, INV, CACHE#, FERR#, SMI#, WB/WT#, HITM#, SMIACT#, SRESET, NMI, INTR, 
FLUSH#, RESET, A20M#, EADS#, PCD, PWT, D/C#, M/IO#, BE3, BE2, BE1, BEO, BREQ, 
W/R#, HLDA, CLK, AHOLD, HOLD, KEN#, RDY#, CLKMUL, BS8#, BS16#, BOFF#, 
BRDY#, PCHK#, LOCK#, PLOCK#, BLAST#, ADS#, MISCCTL, BUSCTL, ABUSCTL, 
WRcTL < TDI 


NOTES 


“Reserved” corresponds to no connect “NC” or “INC” signals on the Intel486 processor. 


All the *CTL cells are control cells that are used to select the direction of bidirectional pins or 
three-state output pins. When “1” is loaded into the control cell (*CTL), the associated pin(s) are 
three-stated or selected as input. The following lists the control cells and their corresponding pins. 


1. WRCTL controls the D[31:0] and DP[3:0] pins. 
2. ABUSCTL controls the A[31:2] pins. 


3. BUSCTL controls the ADS#, BLAST#, PLOCK#, LOCK#, WR#, BEO#, BE1#, BE2#, 
BE3#, MIO#, DC#, PWT, and PCD pins. 


4. MISCCTL controls the PCHK#, HLDA, and BREQ pins. 
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B.5.6 TAP Controller Initialization 


The TAP controller is automatically initialized when a device is powered up. In addition, the TAP 
controller can be initialized by applying a high signal level on the TMS input for five TCK peri- 
ods. 


B.5.7 Boundary Scan Description Language (BSDL) Files 
See Appendix D for an example of a BSDL file for Intel486 processors. 
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APPENDIX C 
ADVANCED FEATURES 


Some non-essential information regarding the Intel486™ processor is considered Intel confiden- 
tial and proprietary and is not documented in this publication. This information is provided in the 
Supplement to the Pentium® Processor Family Developer’s Manual (order #241428) and is 
available with the appropriate non-disclosure agreements in place. Please contact Intel Corpora- 
tion for details. 


The Supplement to the Pentium® Processor Family Developer’s Manual contains architecture 
extensions for the Intel486 and Pentium processors that are confidential and non-essential for 
standard applications. These extensions include low-level registers that provide access to features 
such as page extensions, Virtual Mode extensions, testing, and performance monitoring. 


This information is specifically targeted at software developers who develop the following types 
of low-level software: 


© Operating system kernels 

¢ Virtual memory managers 

¢ BIOS and processor test software 
¢ Performance monitoring tools 


For software developers designing other categories of software, this information does not apply. 
All of the required program development details are provided in the Intel486™ Microprocessor 
Family Programmer's Reference Manual. To obtain this document, contact the Intel Corporation 
Literature Center at: 


Intel Corporation 
P.O. Box 5937 
Denver, CO 80217-9808 


1-800-548-4725 
(Reference Order Number 240486) 
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APPENDIX D 
FEATURE DETERMINATION 


D.1 CPUID INSTRUCTION 


The Intel486™ processor implements the CPUID instruction to make information available to the 
system software about the family, model, and stepping of the processor. Support of this instruc- 
tion is indicated by the ability of system software to write and read the bit in position 
EFLAGS.21, referred to as the EFLAGS.ID bit. The actual state of the EFLAGS.ID bit is irrele- 
vant to the hardware. This bit is reset to zero upon device reset (RESET and SRESET) for com- 
patibility with older Intel486 processor designs. 


D.2> OPERATION 


The CPUID instruction requires the software developer to pass an input parameter to the proces- 
sor in the EAX register. The processor response is returned in registers EAX, EBX, ECX, and 
EDX. 


1. When the parameter passed to EAX is zero, the register values returned upon instruction 
execution are: 


EAX[31:0] < 1 

EBX[31:0] <756E6547—’Genu’,, with “G” in the low nibble of BL 
EDX[31:0] < 49656E69—inel!’, with “i” in the low nibble of DL 
ECX[31:0] <6C65746E—"ntel’”, with “n” in the low nibble of CL 


The values in EBX, ECX, and EDX indicate an Intel processor. When taken in the proper 
order, they decode to the string “GenuinelIntel.” 
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2. When the parameter passed to EAX is 1, the register values returned upon instruction 
execution are: 


EAX[3:0] <-xxxx—Stepping ID 

EAX[7:4] <—xxxx—Model (See Table D-2.) 

EAX[11:8] <— 0100—Family 

EAX[15:12] — 0000 

EAX[31:16] <Intel Reserved 

EBX[31:0] < 00000000 

ECX[31:0] <— 00000000 

EDX[0:0] <— 1—FPU on-chip 

EDX[3:1] < 1—For more information on these bits, see Appendix A 
EDX[31:4] < Intel Reserved 


The value returned in EAX after CPUID instruction execution is identical to the value 
loaded into EDX upon device reset. Software must avoid any dependency upon the state 
of reserved processor bits. 


3. When the parameter in EAX is greater than one, the register values returned upon 
instruction execution are: 


EAX[31:0] <— 00000000 
EBX[31:0] < 00000000 
EDX[31:0] < 00000000 
ECX[31:0] <-00000000 


D.3 FLAGS AFFECTED 
No flags are affected. 


D.4 EXCEPTIONS 


None. 


D.5 FOR MORE INFORMATION 


Refer to the Intel application note AP-485, Intel Processor Identification with the CPUID In- 
struction, for more details. 
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Table D-1. CPUID Instruction Description 


FEATURE DETERMINATION 


OPCODE Instruction kei ills EAX Input Value Description 
OF A2 CPUID 14 1 Processor Identification 
9 0 or greater than 1 Intel String/Null Registers 
Table D-2. Intel486™ Processor Signatures 

Family Model Stepping’ Description 

0100 0000 and 0001 XXXX Intel486™ DX Processors 

0100 0010 XXXX Intel486 SX Processors 

0100 0011 XXXX Intel487™ Processors? 

0100 0011 XXXX IntelDX2™ and InteIDX2 OverDrive® Processors 
0100 0100 XXXX Intel486 SL Processor® 

0100 0101 XXXX IntelSX2™ Processors 

0100 1000 XXXX IntelDX4™ and IntelDX4 OverDrive Processors 
0100 1001 XXXX Write-Back Enhanced IntelDX4 Processor 

0101 0101 XXXX Reserved for Pentium® OverDrive Processor for 

InteIDX4 Processor 
NOTES: 


1. Intel releases information about stepping numbers as needed. 
2. This processor does not implement the CPUID instruction. 
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APPENDIX E 
/O BUFFER MODELS 


For processor bus speeds above 33 MHz (e.g., 50 MHz), the capacitive derating curves are not 
guaranteed. For bus speeds of 50 MHz, I/O buffer modeling techniques should be used to accu- 
rately simulate (and predict) the behavior of processor signals in a particular environment. 


This appendix presents sample I/O buffer model parameters for the Write-Back Enhanced 
IntelIDX4™ processor. A text listing of the data is presented in the IBIS format. 


V/O buffer model information is available for all Intel486™ processors described in this 
datasheet. Contact your Intel representative for the latest I/O buffer models for the Write-Back 
Enhanced IntelDX4 processor and other members of the Intel486 processor family. 


E.1. SAMPLE IBIS FILES FOR THE WRITE-BACK ENHANCED IntelIDX4™ 
PROCESSOR 


The following pages present sample IBIS file outputs for the Write-Back Enhanced IntelDX4 
processor. 


E.2 SAMPLE TEXT LISTING OF IBIS FILES FOR THE WRITE-BACK ENHANCED 
IntelDX4™ PROCESSOR 


KKKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK 


IBIS Ver] de 

File name] WBDX4PGA.ibs 

File Rev] 1.0 

Date] 5/13/95 

Source] File originated at Intel Corporation, mainly as an example 
of a proper IBIS ASCII 1.0 file. However, data is believed 
to be accurate. 

Note] Pull-up, Pull-down VI curves, ramp times, and ESD 


taken from a lab measurement on silicon. 
Packaging values taken from a package characteristic table 
from Intel (Chandler). Power supply is 3.3V with 5V tolerance 


Disclaimer] This information is for modeling purposes only, and 
is not guaranteed. 
KKKEK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK 
Component] CPU 
Manufacturer] Intel 
Package] 
typ min max 
R_pkg 2329m 728m 3930m | not in databook 
L_pkg 17.79nH 8.56nH 27.01nH 
C_pkg 6.03pF 1.89pF 10.16pF 
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KREKKKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKKKKKKK 


[Pin] signal_name model_name R_pin L_pin C_pin 
C03 CLK CLK 1248m 21.10n 9.79p 
D15 A20M [N3 1202m 10.47n 4.14p 
Sle ABUS10 [O02 1268m 12.22n 2.89p 
R12 ABUS [O02 1100m 9.94n 3.86p 
S07 ABUS12 [O02 1330m 11.14n 4.49p 
Q10 ABUS13 [02 860m 8.99n 1.81p 
S05 ABUS14 [02 816m 12.55n S208 
RO7 ABUS15 [02 832m 9.80n 2.08p 
Q09 ABUS16 [02 698m 7.86n 2715p 
Q03 ABUS17 [O02 2194m 17.37n 4.59p 
RO5 ABUS18 [O02 1376m 11.38n 4.62p 
Q04 ABUS19 [O02 1118m 11.39n 2.61p 
Q14 ABUS2 OUT1 1050m 9.68n 33120 
Q08 ABUS20 [02 756m 7.90n 23.7 1p 
Q05 ABUS21 [O 960m 10.51n 2..32p 
Q07 ABUS22 fe) 980m 9.32n 34539 
S03 ABUS23 (0) 494m 13.48n 3.30p 
Q06 ABUS24 LO 1030m 9.58n Se5 7p 
RO2 ABUS25 LO 1476m 13.38n 3257p 
S02 ABUS26 LO 1690m 13.01n 5.49p 
Sol ABUS27 (e) 1782m 15.08n 3. 83p 
RO1 ABUS28 (0) 1694m 13.03n S400 
P02 ABUS29 LO 1250m 12.12n 2.85p 
R15 ABUS3 OUT1 1314m 12.48n 2.97p 
P03 ABUS30 LO 1162m 10.26n 4.03p 
Q01 ABUS31 LO 1572m 13.9in 3.45p 
S16 ABUS4 [O02 1628m 12.68n 5...32p 
Q12 ABUS5 [O02 724m 9.20n 1.88p 
S15 ABUS6 [02 1482m 13.41n 3428p 
Q13 ABUS7 [O02 954m 9.18n 3.46p 
R13 ABUS8 [02 1064m 11.09n 2.51p 
Qll ABUS9 [O02 730m 8.02n 2.84p 
Siz ADS# OUT1 1716m 14.71n 3% T Lp 
Al7 AHOLD [IN2 2404m 18.530 4.98p 
K15 BEO# OUT2 1022m 9.54n 3.64p 
J16 BE1# OUT2 1062m 11.08n 2451p 
J15 BE2# OUT2 914m 8.98n 3.35p 
F17 BE3# OUT2 1296m 12.38n 2.94p 
R16 BLAST# OUT1 1482m 11.93n 4.91p 
D17 BOFF# [IN 2158m 15.44n 6.78p 
so4 BRDYC# [IN3 3230m 2 sen 6.50p 
H15 BRDY# [IN 1234m 12.04n 2.82p 
Q15 BREQ OUT2 2544m 19.31n 5.24p 
C17 BS16# [IN2 2574m 19.48n 5429p 
D16 BS8# [IN2 2514m 17.29n 7.76p 
B12 CACHE# OUT2 1154m 10.22n 4.01lp 
PO1 DBUSO LO 1496m 12.00n 4.95p 
NO2 DBUS1 LO 1168m 11.67n 2.70p 
E03 DBUS10 LO 1358m 11.28n 4.57p 
col DBUS11 LO 1758m 14.95n 2279p 
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G03 DBUS12 01 886m 8.83n 3.27p 
Do2 DBUS13 01 420m 3.07n 3.17p 
K03 DBUS14 01 152m 0.21n 4.00p 
FO3 DBUS15 01 860m 9.96n 2.13p 
J03 DBUS16 01 996m 0.71n 2.38p 
DO3 DBUS17 01 092m 9.90n 3.84p 
co2 DBUS18 01 346m 1.22n 4.54p 
BOl DBUS19 01 570m 3.90n 3.44p 
NO1 DBUS2 01 412m 1.56n 4.72p 
AO1 DBUS20 01 746m 3.30n 5.64p 
BO2 DBUS21 01 480m 3.40n 3.28p 
A02 DBUS22 01 628m 2.68n 5.32p 
AO4 DBUS23 01 294m 2.37n 2.93p 
A06 DBUS24 01 092m 1.25n 2.56p 
B06 DBUS25 01 064m 9.76n 3.76p 
C07 DBUS26 01 648m 8.78n 1.74p 
co6 DBUS27 01 862m 8.71n 3.20p 
cos DBUS28 01 596m 8.49n 1.65p 
AO8 DBUS29 01 1074m 9.81n 3.79p 
HO2 DBUS3 01 1744m 14.87n 3.76p 
cog DBUS30 01 578m 8.39n 1.61p 
B08 DBUS31 01 866m 8.73n 3.21p 
M03 DBUS4 01 972m 9.28n 3.51p 
J02 DBUS5 01 988m 0.67n 2.37p 
LO2 DBUS6 01 226m 0.60n 4.21p 
L03 DBUST 01 600m 4.07n 3.50p 
FO2 DBUS8 01 240m 2.07n 2.83p 
DO1 DBUS9 01 840m 3.79n 5.90p 
M15 D/C OUT2 474m 1.88n 4.89p 
No3 DPO ) 008m 0.78n 2.41p 
FOL DP1 01 528m 2.16n 5.04p 
HO3 DP2 01 812m 3.64n 5.83p 
A05 DP3 01 272m 0.84n 4.33p 
B17 EADS# N2 708m 4.67n 3.70p 
c14 FERR# ouUT2 612m 2.60n 5.27p 
C15 FLUSH N2 332m 1.15n 4.50p 
Al2 HITM# OUT2 182m 1.75n 2.73p 
P15 HLDA OUT2 918m 5.83n 4.08p 
E15 HOLD N2 576m 2.41n 5.17p 
A15 IGNNE# N3 852m 3.85n 5.94p 
Al6é INTR N3 782m 5.08n 3.83p 
A10 INV N2 002m 0.75n 2.40p 
F15 KEN# N2 436m 3.16n 3.19p 
N15 LOCK# OUT1 008m 9.46n 3.61p 
N16 M/IO OUT2 464m 7.76n 1.40p 
B15 NMI N3 450m 1.76n 4.83p 
L15 PWT ouT2 498m 3.50n 3.31p 
J17 PCD ouUT2 360m 1.29n 4.58p 
Q17 PCHK# OUT1 612m 2.60n 5.27p 
Q16 PLOCK# OUT1 320m 2.51n 2.98p 
F16 RDY# IN] 810m 3.63n 5.82p 
C16 RESET IN3 398m 2.95n 3.12p 
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C12 SMIACT# OUT2 834m 8.56n 3413p 
B10 SMI# IN3 790m 9.570 2.00p 
C10 SRESET IN3 596m 8.49n 1, bap 
G15 STPCLK# IN3 1792m 15.13n 3, 85p 
B13 WBWT IN2 1208m 10.50n 4.16p 
N17 W/R OUT1 1672m Le, Gin 5.44p 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK 


The following model descriptions describe the different types of buffers. 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKKKKK KK 


OUT1 MODEL 

KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KK 

[Model] OUT1 

Model_type Output 

Polarity Non-Inverting 

Enable Active-High 

Signals BEO#, BE1#, BE2#, BE3#, BREQ#, D/C#, FERR#, HLDA, M/IO#, 

PCD, PWT 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKKKKKKEK 
typ min max 

C_comp 3.3pF 3.3pF 3.3pF 

[Voltage range] 32.3) 2.7V 3.7V 

KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK 

[Pulldown] 
Voltage I (typ) I (min) I (max) 
-5V -2.27a -1.68a -2.6a 
-4.5V 21,4 99a -1.48a =2.284 
-4V -1.7la -1.27a -1.95a 
=350V -1.43a -1.07a si. 63a 
—3V a1 S138 -0.87a Sty3le 
=2.5V -0.87a -0.66a -0.98a 
a2y -0.59a -0.46a -0.66a 
a1l.5V -0.32a -0.27a =0,35a 
-1V -68.4ma -80.2ma -75.6ma 
-0.5V -23.2ma -14.6ma -32.2ma 
0 -7.2na -16.5na -12.7na 
0.5V 22.2ma 13.3ma 31.4ma 
1vV 40.5ma 23.5ma 58.5ma 
TeV. 52.7ma 29.5ma 78.7ma 
2V 58.34ma 30.97ma 90.41ma 
2.5V 59.39ma 31.39ma 94.46ma 
3V 59.88ma 31.69ma 95.32ma 
3.0V 60.2ma 31.8ma 95.76ma 
4V 60.39ma 31.91ma 96.66ma 
4.5V 60.55ma 32ma 96.47ma 
5V 60.67ma 32.08ma 96.66ma 
5.5V 60.79ma 32.16ma 96.82ma 
6V 60.89ma 32.23ma 96.96ma 
6.5V 60.99ma 32.3ma 97.1ma 
7V 61.09ma 32.37ma 97.23ma 
7.5V 61.19ma 32.44ma 97.36ma 
8V 61.29ma 32.51ma 97.48ma 


E-4 


intel. 


[GND_clamp] 


[Pullup] 


Voltage 
-1.0V 
+0, 9V 
-0.8V 
a EN 
-0.6V 
=0,, 5V 
-0.4V 
-0.1V 
0.0V 


Voltage 
8V 
7.5V 
6.5V 
6V 
5.0 
5V 
4.5V 
AV 
es af 
3V 
2.5V 
2vV 
1.5V 
1V 
0.5V 
Ov 

=O 5V 
Sy 
=. SV 
-2V 
=25V. 
=3V 
=345¥ 
-4V 
-4.5V 
=5V 

a sO 
—6V 
+§.5V 


cco TR ow Se er SO oe Te Bs Te er | 


Donn o BW DN 


I (min) 


Ee i a 
ese 
336.. 
-22. 
-11. 
=3:49 
-4.2 


OOmA 
54mA 
38mA 
OOmA 
OOmA 
2mA 

3mA 


0.00mA 
0.00mA 


I (min) 


21.8 
-1.4 
a1 42 
=2.0 
“0.8 
=0..6 
-0.4 
ee 


4 
3 
3 
2 
2 
2 
2 
2 


-84mA 
-51mA 
.78mA 
-8mA 

-14mA 
-16mA 
-48mA 


7.33nA 


16.9 
35.0 
52.9 
69.8 
83.3 
9962 
O..14 
ole 
+13 
els 
14 
old 
6.4 


wo OOo Oo 


8mA 
5mA 
2mA 
2mA 
4mA 
7mA 


5mA 


/O BUFFER MODELS 


I (max) 


=242 
=16 
=1L.2 
=049 
=006 
=0.63 
= OTe 
=D 4 
-94, 
“ORs 
+89 « 
HO 3% 
ard 
=22% 
+265 
83.4 
28.2 
58.8 
86.3 
Le pe ce 
+h3 
eS 
me 
ele 
.18 
Pa De 
ee 
«22 
Roe 


AES OE ES DS ED 


48mA 
62mA 
11mA 
13mA 
56mA 
82mA 
1imA 
1imA 
99mA 
1inA 
8mA 
8mA 
6mA 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


[POWER_clamp] 


Voltage I (typ) I (min) I (max) 

-2.7V 14.0mA 18.65mA 9.9mA 

-2.6V 11.2mA 14.90mA 7.5mA 

a2 8.0mA 10.90mA 5.0mA 

-2.4V 4.3mA 6.80mA 2.9mA 

-2.3V 1.6mA 2.80mA 1.0mA 

a2 .2V 0.0mA 0.00mA 0.OmA 

-2.1V 0.0mA 0.00mA 0.O0mA 

es 0.O0mA 0.00mA 0.0mA 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKKK 
[Ramp ] 

typ min max 
dv/dt_r 1.27/0.549n 0.87/1.044n 1.60/0.328n 
dv/dt_f 1.31/0.496n 0.83/0.855n 1.66/0.288n 
KREKKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKKKK 
OUT2 MODEL 
KREKKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKKKKK KK 
[Model] OUT2 
Model_type Output 
Polarity Non-Inverting 
Enable Active-High 
Signals A[2-3] 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKKKK 
typ min max 

C_comp 6.1pF 6.1pF 6.1pF 
[Voltage range] 3.3V 2.7V 3.7V 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKKKKK KK 
[Pulldown] 

Voltage I (typ) I (min) I (max) 

-5V -2.27a -1.68a -2.6a 

-4.5V =, 99a -1.48a -2.28a 

-4V -1.7la -1.27a -1.95a 

=3.5V -1.43a =1. 07a -1.63a 

—3V ena -0.87a ed Sle 

= 280 -0.87a -0.66a -0.98a 

m2 -0.59a -0.46a -0.66a 

=1,59 “0.324 0.274 =0.354 

-1V -68.4ma -80.2ma -75.6ma 

-0.5V -23.2ma -14.6ma —32.2ma 

0 -7.2na -16.5na -12.7na 

0.0 22.2ma 13.3ma 31.4ma 

1vV 40.5ma 23.5ma 58.5ma 

1.5V 52.7ma 29.5ma 78.7ma 

2V 58.34ma 30.97ma 90.41ma 

2.5V 59.39ma 31.39ma 94.46ma 

3V 59.88ma 31.69ma 95.32ma 

3.5V 60.2ma 31.8ma 95.76ma 

4v 60.39ma 31.91ma 96.66ma 

4.5V 60.55ma 32ma 96.47ma 

5V 60.67ma 32.08ma 96.66ma 

Dev. 60.79ma 32.16ma 96.82ma 


E-6 


intel. 


6V 
6.5V 
7V 
ree) s 
8V 


[GND_clamp] 


[Pullup] 


Voltage 
-1.0V 
20. 9V. 
-0.8V 
a EV 
-0.6V 
=0, 5V 
-0.4V 


KES: RD ED SD. I 


Daou ® WN 
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3ma 

37ma 
44ma 
51ma 
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-7 
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33nA 
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-O05mA 
-92mA 
.82mA 
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14 
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=16 
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at oe 
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58.8 
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218 
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RSS OS ED 


2 
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EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


[POWER_clamp] 


Voltage I (typ) I (min) I (max) 
-2.7V 14.0mA 18.65mA 9.9mA 
Fh OM, 11.2mA 14.90mA 7.5mA 
a2 8.0mA 10.90mA 5.0mA 
-2.4V 4.3mA 6.80mA 2.9mA 
S253 1.6mA 2.80mA 1.0mA 
a2 .2V 0.0mA 0.00mA 0.OmA 
-2.1V 0.0mA 0.00mA 0.O0mA 
es TF 0.O0mA 0.00mA 0.0mA 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKKKKKK 
Ramp] 
typ min max 
dv/dt_r 1.27/0.549n 0.87/1.044n 1.60/0.328n 
dv/dt_f 1.31/0.496n 0.83/0.855n 1.66/0.288n 
KEK KK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKKKKKK 
OUT3 MODEL 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKKKKKK 
[Model] OUT3 
Model_type Output 
Polarity Non-Inverting 
Enable Active-High 
| Signals ADS#, BLAST#, LOCK#, PCHK#, PLOCK#, W/R, TDO 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KK 
typ min max 
C_comp 6.1pF 6.1pF 6.1pF 
[Voltage range] 3.3V 2.7V 3.7V 


[FOC ICICI IC IC II II III II III III ICI ICI IC ICICI ICI ICI IC ICI II ICI I I I ae 


[Pulldown] 


Voltage I (typ) [ (min) I (max) 

-5V -2.27a -1.68a -2.6a 

-4.5V =, 99a -1.48a -2.28a 
-4V -1.7la -1.27a -1.95a 
=3.5V -1.43a =1. 07a -1.63a 
—3V ena -0.87a ed Sle 
= 280 -0.87a -0.66a -0.98a 
-2V -0.59a -0.46a -0.66a 
=1,59 “0.324 0.274 =0.354 
-1V -68.4ma -80.2ma -75.6ma 
-0.5V -23.2ma -14.6ma —32.2ma 
0 -7.2na -16.5na -12.7na 
0.0 22.2ma 13.3ma 31.4ma 
1vV 40.5ma 23.5ma 58.5ma 
1.5V 52.7ma 29.5ma 78.7ma 
2V 58.34ma 30.97ma 90.41ma 
2.5V 59.39ma 31.39ma 94.46ma 
3V 59.88ma 31.69ma 95.32ma 
3.5V 60.2ma 31.8ma 95.76ma 
4V 60.39ma 31.91ma 96.66ma 
4.5V 60.55ma 32ma 96.47ma 
5V 60.67ma 32.08ma 96.66ma 
Dev. 60.79ma 32.16ma 96.82ma 
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intel. 


6V 
6.5V 
7V 
ree) s 
8V 


[GND_clamp] 


[Pullup] 


Voltage 
-1.0V 
20. 9V. 
-0.8V 
a EV 
-0.6V 
=0, 5V 
-0.4V 


cco BE ow ee > A om ms ee <r 2 a Sr | 
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37ma 
44ma 
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7.54mA 
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=242 
=16 
=. 2 
=054.9 
20.36 
0.3 
OF y 
=95.. 
-94, 
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8mA 
6mA 


EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL 


[POWER_clamp] 
Voltage 
-2.7V 
-2.6V 
-2.5V 
-2.4V 
-2.3V 
-2.2V 
-2.1V 
-1.7V 


A. 


-2mA 


- OmA 


4.3mA 


0. 


- 6mA 
-OmA 
- OmA 


OmA 


I (min) 
18.65mA 
14.90mA 
10.90mA 
6.80mA 
2.80mA 
0.00mA 
0.00mA 
0.00mA 


-OmA 


KREKKKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KK 


[Ramp ] 

typ 
dv/dt_r 
dav/dt_f 


1.27/0.549n 
1.31/0.496n 


min 
0.87/1.044n 
0.83/0.855n 


max 


1.60/0.328n 
1.66/0.288n 


KREKKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK 


I0O1 MODEL 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKKKKKKK 


[Model] Tol 
Model_type I/O 

Polarity Non-Inverting 
Enable Active-High 

| Signals A[21-31] 


Vinl = 0.8V 
Vinh = 2.0V 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKKK 


typ 
C_comp 3. 9pF 
[Voltage range] 323Ve 


[Pulldown] 
Voltage 
-5V 
-4.5V 
-4V 
-3.5V 
-3V 
-2.5V 
-2V 
-1.5V 
-1V 
-0.5V 
0 
0.5V 
1v 
1.5V 
2V 
2.9V) 
3V 
3.5V 
4v 
4.5V 


E-10 


I( 
=2 


ik 
-1. 
Shs 


=. 


-6 
-2 
ait 
22 


40. 
De 
Bo» 
59. 
Do's 


60 


60. 
60. 


typ) 
-27a 
99a 
71a 
43a 
-15a 
.87a 
-59a 
~32a 
8.4ma 
3.2ma 
.2na 
.2ma 
5ma 
7ma 
34ma 
39ma 
88ma 
.2ma 
39ma 
55ma 


min 
3.9pF 
2.7V 


I (min) 

-1.68a 

-1.48a 

-1.27a 

-1.07a 

-0.87a 

-0.66a 

-0.46a 

-0.27a 

-80.2ma 
-14.6ma 
-16.5na 
13.3ma 

23.5ma 

29.5ma 

30.97ma 
31.39ma 
31.69ma 
31.8ma 

31.91ma 
32ma 


max 
3.9pF 
3.7V 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKKK 


TI (max) 


ex 


=e: 


Fes 
ile, 


+1 


6a 

-28a 
95a 
63a 
31a 
-98a 
-66a 
35a 


5V 
5.5V 
6V 
6.0¥ 
7V 
PeaV 
8V 


[GND_clamp] 


[Pullup] 


Voltage 
-1.0V 
=0.29V 
-0.8V 
=O6 IV 
-0.6V 
=O), 5V 
-0.4V 
-0.1V 
0.0V 


Voltage 
8V 
7.5V 
pews 
6V 
S.oV 
5V 
4.5V 
AV 
3.3¥ 
3V 
2.5V 
2vV 
145V 
1V 
0,.5V 
Ov 
=0..5V 
-1v 
-1.5V 
=2V 
-2.5V 
=3V 
+3 55¥ 
-4V 
-4.5V 
=5V 
cae esl 
—6V 
-6.5V 


Oo Oo 2 6 Oo oc © 


DOaAUaUO BWDND 


324.0 
3244, 
S292 


8ma 
6ma 
3ma 


32.3ma 


ke ee 
32.4 
3245 


7ma 
4ma 
ima 


I (min) 


Boyar 
Os 
=3 84, 
-22. 
Sit: 
=3.9 
-4.2 


OOmA 
54mA 
38mA 
OOmA 
OOmA 
2mA 

3mA 


0.00mA 
0.00mA 


I (min) 


“Lis 
=1.4 
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=1..0 
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4 
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-14mA 
-16mA 
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7.33nA 
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85.3 
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ee 
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id 
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4mA 
7mA 
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8mA 
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EMBEDDED Intel486™ PROCESSOR FAMILY DEVELOPER’S MANUAL intel. 


[POWER_clamp] 


Voltage I (typ) I (min) I (max) 
-2.7V 14. 0mA 18.65mA 9. 9mA 
-2.6V 11.2ma 14.90maA 7.5mA 
a2 8.0mA 10.90mA 5.0mA 
-2.4V 4.3mA 6.80mA 2. 9mA 
-2.3V 1. 6mA 2.80mA 1.0mA 
a2 .2V 0.0mA 0.00mA 0.OmA 
-2.1V 0.0mA 0.00mA 0.O0mA 
es 0.O0mA 0.00mA 0.0mA 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKKK 
[Ramp ] 
typ min max 
dv/dt_r 1.27/0.549n 0.87/1.044n 1.60/0.328n 
dv/dt_f 1.31/0.496n 0.83/0.855n 1.66/0.288n 
KKK KK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKK 
I02 MODEL 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKKKKKK 
[Model] T02 
Model_type I/O 
Polarity Non-Inverting 
Enable Active-High 
Signals A[4-20] 


Vinl = 0.8V 
Vinh = 2.0V 


KEK KKK KKK KKK KKK KK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KK 


typ min max 
C_comp 6.1pF 6.1pF 6.1pF 
[Voltage range] 3.3V 2.7V 8207 
KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKK 
[Pulldown] 
Voltage I (typ) I (min) I (max) 
-5V -2.27a -1.68a -2.6a 
-4.5V =e 99a -1.48a -2.28a 
-4V -1.71la -1.27a -1.95a 
eae eek | -1.43a od OF 1, Boe 
=3V -l.15a -0.87a =l.3la 
=2.0V. -0.87a -0.66a -0.98a 
-2V -0.59a -0.46a -0.66a 
HT OV “0. 32a -0.27a “0.354 
-1V -68.4ma -80.2ma -75.6ma 
-0.5V -23.2ma -14.6ma -32.2ma 
0 -7.2na -16.5na -12.7na 
0.5V 22.2ma 13.3ma 31.4ma 
1v 40.5ma 23.5ma 58.5ma 
1.5V 52.7ma 29.5ma 78.7ma 
2V 58.34ma 30.97ma 90.41ma 
2.5V 59.39ma 31.39ma 94.46ma 
3V 59.88ma 31.69ma 95.32ma 
3.5V 60.2ma 31.8ma 95.76ma 
4V 60.39ma 31.91ma 96.66ma 
4.5V 60.55ma 32ma 96.47ma 


5V 
5.5V 
6V 
6.0¥ 
7V 
PeaV 
8V 


[GND_clamp] 


[Pullup] 


Voltage 
-1.0V 
=0.29V 
-0.8V 
=O6 IV 
-0.6V 
=O), 5V 
-0.4V 
-0.1V 
0.0V 


Voltage 
8V 
7.5V 
pews 
6V 
S.oV 
5V 
4.5V 
AV 
3.3¥ 
3V 
2.5V 
2vV 
145V 
1V 
0,.5V 
Ov 
=0..5V 
-1v 
-1.5V 
=2V 
-2.5V 
=3V 
+3 55¥ 
-4V 
-4.5V 
=5V 
cae esl 
—6V 
-6.5V 


Oo Oo 2 6 Oo oc © 


DOaAUaUO BWDND 
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3ma 


32.3ma 


ke ee 
32.4 
3245 
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Sit: 
=3.9 
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OOmA 
54mA 
38mA 
OOmA 
OOmA 
2mA 

3mA 


0.00mA 
0.00mA 
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99.2 
0.11 
0.12 
Pe 
ee 
14 
id 
6.4 


heal! CS ES ES 


8mA 
5mA 
2mA 
2mA 
4mA 
7mA 


5mA 


/O BUFFER MODELS 


I (max) 


et es 
=1,6 
ad ee 
=0.9 
=0..6 
=0 63 
=O7 ¢ 
“9Dy 
-94, 
=92 6 
389, 
=B3, 
=T1y 
=D2 « 
-“Z26% 
83.4 
20528 
Piemeo) 
86.3 
O%.11 
a Be 
re 
hd 
17 
me) 
+18 
6k 
+22. 
ee 


1. SS DS 


48mA 
62mA 
11mA 
13mA 
56mA 
82mA 
1imA 
1imA 
99mA 
inA 
8mA 
8mA 
6mA 
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[POWER_: 


clamp] 
Voltage 
-2.7V 
-2.6V 
-2.5V 
-2.4V 
-2.3V 
-2.2V 
-2.1V 
=i .7V: 


A. 


-2mA 


- OmA 


4.3mA 


0. 


- 6mA 
-OmA 
- OmA 


OmA 


I (min) 
18.65mA 
14.90mA 
10.90mA 
6.80mA 
2.80mA 
0.00mA 
0.00mA 
0.00mA 


-OmA 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKKK 


[Ramp ] 


dv/dt_r 
dv/dt_f 


typ 


1.27/0.549n 
1.31/0.496n 


KREKKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKKKKKK 


min 
0.87/1.044n 
0.83/0.855n 


I03 MODEL 


max 


1.60/0.328n 
1.66/0.288n 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKKKKKK 


[Model] 
Model_t 
Polarit 
Enable 
Signals 
Vinl = 
Vinh = 


ype 
Y 


0.8V 
2.0N 


103 
I/O 


Non-Inverting 


Active-High 


D[0-31], 


DP [0-3] 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKK 


C_comp 
[Voltag 


e range] 


typ 
3.9pF 
s23V 


min 
3.9pF 
2.7V 


max 


3.9pF 
3.7V 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKKKKKK 


[Pulldown] 


Voltage 
=BV 
-4.5V 
-4V 
eae eek | 
=3V 
-2.5V 
-2V 
-1.5V 
si 
=O. DV 
0 
0.5V 
1V 
1.5V 
2vV 
2.5Vi 
3V 
3.5V 
av 
4.5V 


I(typ) 


2, 
=i: 


-1. 
Shs 


=. 


-6 


-27a 
-99a 
71a 
43a 
-15a 
.87a 
-59a 
~32a 
8.4ma 


-23.2ma 


ait 
22 


40. 
De 
Bo» 
59. 
Do's 


60 


60. 
60. 


.2na 
.2ma 
5ma 
7ma 
34ma 
39ma 
88ma 
.2ma 
39ma 
55ma 


I (min) 

-1.68a 

-1.48a 

-1.27a 

-1.07a 

-0.87a 

-0.66a 

-0.46a 

-0.27a 

-80.2ma 
-14.6ma 
-16.5na 
13.3ma 

23.5ma 

29.5ma 

30.97ma 
31.39ma 
31.69ma 
31.8ma 

31.91ma 
32ma 


I( 


ex 


=e: 


Fes 
ile, 


+1 


max) 
6a 

-28a 
95a 
63a 
31a 
-98a 
-66a 
35a 


5V 
5.5V 
6V 
6.0¥ 
7V 
PeaV 
8V 


[GND_clamp] 


[Pullup] 


Voltage 
-1.0V 
=0.29V 
-0.8V 
=O6 IV 
-0.6V 
=O), 5V 
-0.4V 
-0.1V 
0.0V 


Voltage 
8V 
7.5V 
pews 
6V 
S.oV 
5V 
4.5V 
AV 
3.3¥ 
3V 
2.5V 
2vV 
145V 
1V 
0,.5V 
Ov 
=0..5V 
-1v 
-1.5V 
=2V 
-2.5V 
=3V 
+3 55¥ 
-4V 
-4.5V 
=5V 
cae esl 
—6V 
-6.5V 


Oo Oo 2 6 Oo oc © 


DOaAUaUO BWDND 


324.0 
3244, 
S292 


8ma 
6ma 
3ma 


32.3ma 


ke ee 
32.4 
3245 


7ma 
4ma 
ima 


I (min) 


Boyar 
Os 
=3 84, 
-22. 
Sit: 
=3.9 
-4.2 


OOmA 
54mA 
38mA 
OOmA 
OOmA 
2mA 

3mA 


0.00mA 
0.00mA 


I (min) 


“Lis 
=1.4 
cae! 
=1..0 
=O 28 
=0'.6 
-0.4 
=O 4 


4 
3 
3 
2 
2 
2 
2 
2 


-84mA 
-51mA 
.78mA 
.8mA 

-14mA 
-16mA 
-48mA 


7.33nA 


1669 
35.0 
5249 
69.8 
85.3 
99.2 
0.11 
0.12 
Pe 
ee 
14 
id 
6.4 


heal! CS ES ES 


8mA 
5mA 
2mA 
2mA 
4mA 
7mA 


5mA 


/O BUFFER MODELS 


I (max) 


et es 
=1,6 
ad ee 
=0.9 
=0..6 
=0 63 
=O7 ¢ 
“9Dy 
-94, 
=92 6 
389, 
=B3, 
=T1y 
=D2 « 
-“Z26% 
83.4 
20528 
Piemeo) 
86.3 
O%.11 
a Be 
re 
hd 
17 
me) 
+18 
6k 
+22. 
ee 


1. SS DS 


48mA 
62mA 
11mA 
13mA 
56mA 
82mA 
1imA 
1imA 
99mA 
inA 
8mA 
8mA 
6mA 
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[POWER_clamp] 


Voltage I (typ) I (min) I (max) 

-2.7V 14.0mA 18.65mA 9.9mA 

-2.6V 11.2mA 14.90mA 7.5mA 

a2 8.0mA 10.90mA 5.0mA 

-2.4V 4.3mA 6.80mA 2.9mA 

-2.3V 1.6mA 2.80mA 1.0mA 

-2.2V 0.0mA 0.00mA 0.OmA 

-2.1V 0.0mA 0.00mA 0.O0mA 

=i .7V: 0.O0mA 0.00mA 0.0mA 
KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKKKK KKK KKK KKK 
[Ramp ] 

typ min max 

dv/dt_r 1.27/0.549n 0.87/1.044n 1.60/0.328n 
dv/dt_f 1.31/0.496n 0.83/0.855n 1.66/0.288n 


KEK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKKKKKKK 


INPUT1 MODEL 


KEK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKKKK 


[Model] INPUT1 

Model_type Input 

Polarity Non-Inverting 

Enable Active-High 

Signals A20M#, AHOLD, BOFF#, BRDY#, BS8#, BS16#, FLUSH#, HOLD, IGNNE# 


INTR, KEN#, NMI, RDY#, RESET, EADS, TCK, TDI, TMS 
Vinl = 0.8V 
Vinh = 2.0V 


KEK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKKKKKKK 


typ min max 
C_comp 1.7pF 1. 7pF 1.7pF 
[Voltage range] 3.3V 2.8V 3.6V 


KREKKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKKKK 


[GND_clamp] 


Voltage I (typ) I (min) I (max) 
-1.0V —-64.0mA -77.00mA —55.00mA 
-0.9V —-47.0mA -57.54mA -37.23mA 
=0..8V -27.0mA —36.38mA -17.76mA 
-0.7V -15.0mA -22.00mA -7.60mA 
-0.6V -7.5mA -11.00mA -4.23mA 
-0.5V -4.0mA -5.92mA -2.54mA 
-0.4V -2.5mA -4.23mA -0.85mA 
021 0.0mA 0.00mA 0.00mA 
0.0V 0.0mA 0.00mA 0.00mA 
[POWER_clamp] 
Voltage I (typ) I (min) I (max) 
-2.7V 14.0mA 18.65mA 9.9mA 
-2.6V 11.2mA 14.90mA 7.5mA 
FL DV 8.0mA 10.90mA 5.0mA 
-2.4V 4.3mA 6.80mA 2.9mA 
-2.3V 1.6mA 2.80mA 1.0mA 
S252V' 0.0mA 0.00mA 0.O0mA 
-2.1V 0.0mA 0.00mA 0.0mA 
= SV, 0.0mA 0.00mA 0.O0mA 
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KEK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKKKKKKK 


INPUT2 MODEL 


KEK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKKKKKKK 


[Model] INPUT2 
Model_type Input 
Polarity Non-Inverting 
Enable Active-High 
Signals CLK 


Vinl = 0.8V 
Vinh = 2.0V 


KEK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KK KKK KKK KKK KKKKKKKKKKK 


typ min max 
C_comp 2.8pF 2.8pF 2.8pF 
[Voltage range] 3.3V 2.7V B.27V 
KEK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKKKKKKK 
[GND_clamp] 
Voltage I (typ) I (min) I (max) 
-1.0V —64.0mA —-77.00mA —55.00mA 
-0.9V —-47.0mA —-57.54mA -37.23mA 
-0.8V -27.0mA —36.38mA -17.76mA 
eae al -15.0mA —22.00mA —-7.60mA 
-0.6V —-7.5mA —11.00mA —-4.23mA 
4025V —4.0mA —5.92mA —-2.54mA 
-0.4V —2.5mA —-4.23mA —0.85mA 
a0 Le 0.OmA 0.00mA 0.00mA 
0.0V 0.OmA 0.00mA 0.00mA 
[POWER_clamp] 
Voltage I (typ) I (min) I (max) 
-2.7V 14.0mA 18.65mA 9.9mA 
-2.6V 11.2mA 14.90mA 7.5mA 
=2.5V 8.0mA 10.90mA 5.0mA 
-2.4V 4.3mA 6.80mA 2.9mA 
SB 1.6mA 2.80mA 1.0mA 
=262N 0.OmA 0.00mA 0.OmA 
2 ae 0.OmA 0.00mA 0.OmA 
[End] 
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APPENDIX F 
BSDL LISTINGS 


Below is a listing of a boundary scan description language (BSDL) file for the Write-Back En- 
hanced IntelDX4™ processor. 


This file is provided as an example. Contact Intel for design information for this and other 
Intel486™ processors. See Section B.5, “Intel486™ Processor Boundary Scan (JTAG),” in Ap- 
pendix B for a complete description of BSDL instructions and usage. 


F.1.| WRITE-BACK ENHANCED IntelDX4™ PROCESSOR LISTING 


-- Copyright Intel Corporation 1993 

KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKK KKK KKK KKK KKK 
-—- Intel Corporation makes no warranty for the use of its products 

-- and assumes no responsibility for any errors which may appear in 

-- this document nor does it make a commitment to update the information 

-- contained herein. 

KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKK KKKKKKKK KKK KKK KKK KKK 
-- Boundary-Scan Description Language (BSDL Version 0.0) is a de-facto 

-—- standard means of describing essential features of ANSI/IEEE 1149.1-1990 

-—- compliant devices. This language is under consideration by the IEEE for 

-—- formal inclusion within a supplement to the 1149.1-1990 standard. The 

—- generation of the supplement entails an extensive IEEE review and a formal 
-—- acceptance balloting procedure which may change the resultant form of the 
-—- language. Be aware that this process may extend well into 1993, and at 


-—- this time the IEEE does not endorse or hold an opinion on the language. 
KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KK KKK KKK KKK KKK KKK KKK KKK KKKKK KKK KKK KK KK 


—- Write-Back Enhanced IntelDX4(TIM) processor BSDL description 
-—- This file has been electrically verified. 


-—- Rev: 1.3 4/26/95 


entity WBE_INTELDX4 is 


generic(PHYSICAL_PIN_MAP : string := "PGA_17x17"); 
port (A20M fin bit; 

ABUS2 : out bit; 

ABUS3 : out bit; 

ABUS : inout bit_vector (4 to 31); -- Address bus (words) 

ADS : out bit; 

AHOLD = en birt; 

BE : out bit_vector(0 to 3); 

BLAST > out bit; 

BOFF : in bit; 

BRDY : in bites 

BREQ : out Die; 
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F-2 


Ww 
n 
foe) 


Q 


7HAOODOUAA 
QD 
1) 
n 


m "4 
E 
Ga 
Nn 
aa 


SMI 
SMIACT 
SRESET 
STPCLK 
TCK, TMS, 
TDO 

UP 
VCC_PGA 
vccs 
VOLDET 
VSS_PGA 
WR 

CACHE 
HITM 

INV 
WB_WT 


in bit; 
in bit; 
in bit; 
in bit; 
inout bit_vector(0 to 31); -- Data bus 
out bit; 
inout bit_vector(0 to 3); 
in bit; 
out bit; 
in bit; 
out bit; 
in bit; 
in bit; 
linkage bit; --Internal NC PGA 
in bit; 
in bit; 
out bit; 
out bit; 
linkage bit; -- No Connect for PGA 
in bit; 
out bit; 
out bit; 
out bit; 
out bit; 
in bit; 
in bit; 
in bit; -- new 
out bit; -- new 
in bit; -- new 
in bit; -- new 
TDI : in bit; -- Scan Port inputs 
out bit; -- Scan Port output 
in bit; 
linkage bit_vector(l to 23); =>. VCC 
linkage bit; --Reference Voltage 
linkage bit; --Voltage Detect Pin 
linkage bit_vector(l1 to 28); -- VSS 
out bit; 
out bit; --New pin 
out bit; --New pin 
in bit; --New pin 
in bit); —--New pin 


use STD_1149_1_1990.al11; 


attribute PIN_MAP of WBE_INTELDX4 


constant PGA_17x17 : PIN_MAP_STRING := 


"A20M 
"ABUS2 
"ABUS3 
"ABUS 


Dio," & 
Q14, " & 
Ribs," -& 


(S16, Q12, S15, Q13, R13, Q11, S13, R12," 


entity is PHYSICAL_PIN_MAP; 


& 


-- Define Pin Out of PGA 


tal. BSDL LISTINGS 


" S7,- °010;°:S5,;. (R7;- °O9;7 O34. (RS; 045-08}: “O5,.-" & 
" Q7, S3, Q6, R2, S2, Sil, Rl, P2, P3, Q1)," & 
"ADS > S17," & 


"AHOLD : Al7, "— & 

"BE : (K15, J16, J15, F17), " & 
"BLAST : R16, "— & 

"BOFF : D17, "— & 

"BRDY : H15, " & 

"BREQ : Q15, " & 

"BS8 : D116, " & 

"BS16 C17. vg 

"CACHE : B12, " & 

"CLK : C3, " & 

"CLKMUL : R17, " & 

"DBUS : (Pl, N2, Nl, H2, M3, J2, L2, L3, F2, Dl, E3, " & 
" Cl, G3, D2, K3, F3, J3, D3, C2, Bl, Al, B2, " & 
" A2, A4, A6, B6, C7, C6, C8, A8, C9, B8)," & 
"DC : M15, " & 

"DP : (N3, Fl, H3, A5), " & 
"EADS : B17, " & 

"FERR : C14, " & 

"FLUSH : C15, " & 

"HLDA : P15, " & 

"HOLD : E15, " & 

"IGNNE : AlS, " & 

"INC_PGA : Al3, " & 

"INTR : Al6, " & 

"KEN : F15, "— & 

"LOCK : N15, " & 

"MIO : N16, " & 

"NC_PGA  : C13, " & 

"NMI : B15, " & 

"PCD PITT; UE 

"PCHK 2 Q17, " & 

"PLOCK : Q16, " & 

"PWT : L15, "— & 

"RDY : Fl6, "— & 

"RESET : C16, " & 

"SMI : B10, " & 

"SMIACT : C12, " & 

"SRESET : C10, " & 

"STPCLK : G15, "| & 

"TCK : A3, " & 

"TDI : Al4, "— & 

"TDO : Bl6, "— & 

"TMS : B14, "— & 

"UP : Cll, " & 

"VCC_PGA : (R8, R9, R10, Rll, R14, P16, M2, M16, L16, K2, K16, " 


" H16, G2, G16, E2, E16, C4, C5, BT, B9, Bll, " 
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WVCC5S 5 
"VOLDET : 
"VSS_PGA 


"WB_WT : 
"HITM . 
"INV :. 
"WR 3 


attribute Tap_Sc 
attribute Tap_Sc 
attribute Tap_Sc 
attribute Tap_Sc 


attribute Instru 


attribute Instru 
"BYPASS (11 
"EXTEST (000 
"SAMPLE (000 
"IDCODE (00 
"RUNBIST (100 
"PRIVATE (00 


attribute Instru 


-- there is no 


attribute Instru 


attribute Instru 
"RUNBIST (reg 
"IR SCAN MUST 
"Assert RESET 
"result O=pas 
"clock CLK in 
"IR End State 
"DR End State 
"RINBIST leng 


attribute Idcode 
"0001" 
"100000101000 
"00000001001" 


"4"; 


attribute Regist 
"BIST[1] (RUN 


: (S6, S8, S9, $10, $11, $12, $14, R4, 2, P17, Ml, " 


ML7, Ll, Li), KI, Ki7; HT, “B17; Gt, Git. El, ELT," 


B3, B4, B5, A7, A9, All), " & 
B13," ~& 
Al2, " & 
AlO, " & 
NEw oN; 
an_In of TDI : signal is true; 
an_Mode of TMS : signal is true; 
an_Out of TDO : signal is true; 
an_Clock of TCK : signal is (25.0e6, BOTH); 


ction_Length of WBE_INTELDX4: entity is 4; 


ction_Opcode of WBE_INTELDX4: entity is 

1)," & 

0)," & 

1)," & 

0)," & 

0)," & 
1,0100,0101,0110,0111,1001,1010,1011,1100,1101,1110)"; 


ction_Capture of WBE_INTELDX4: entity is "0001"; 
nstruction_Disable attribute for WBE_INTELDX4 


ction_Private of WBE_INTELDX4: entity is "private"; 


ction_Usage of WBE_INTELDX4: entity is 
isters BIST; " & 
BE DONE WITH DEVICE IN HARD RESET;" & 

and do IR Scan, then release RESET and do DR Scan;" & 
s, 1=fail;" & 

Run_Test_Idle;" & 

= Pause-IR;" & 

= Run-Test/Idle;" & 
th in hex = 125000H x ratio of TCK/CPUCLK) "; 


Register of WBE_INTELDX4: entity is 


& —--version 
1000" & —- WBE_INTELDX4 part ID 
& —--manufacturers identity 


—-required by the standard 


er_Access of WBE_INTELDX4: entity is 
BIST)" 


intel. 


BSDL LISTINGS 


SEA Ae AG eRe eR He RI He IE Ie ASI RC IIE I A Ke RIE IE Ie A Ke RI IE He ASK KA He FE IE KATE IIE I RA HIER, AK RAE De BAEK HGR HE HY. 


--{ The first cell is closest to TDO 
SIG III AG He A FE III IE IE AIR I IE Ie AIR I II AIR IH Ie A ICR IE EI ICRA EO IC RIE IER I A ICR Ie IG Ke A I TIE He Fe, 


attribute Boundary_Cells of WBE_INTELDX4: 
attribute Boundary_Length of WBE_INTELDX4: 


entity is 
entity is 113; 


attribute Boundary_Register of WBE_INTELDX4: 


"0 

"1 

"2 

"3 

"4 

"5 

"6 

"7 

"8 

"9 

"10 
"11 
"12 
"13 
"14 
"15 
"16 
"17 
"18 
"19 
"20 
"21 
"22 
"23 
"24 
"25 
"26 
"27 
"28 
"29 
"30 
"31 
"32 
"33 
"34 
"35 
"36 
"37 
"38 
"39 
"40 
"41 
"42 


(BC_2, 
(BC_2, 
(BC_6, 
(BC_6, 
(BC_1, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 
(BC_6, 


ABU 
ABU 
ABU 
ABU 
UP 

ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
ABU 
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"43. (BC_6, DBUS(10), bidir, X, 2; ge Lay ™ & 
"44 (BC_6, DBUS(11), bidir, X, 2; oe Zagat & 
"45 (BC_6, DBUS(12), bidir, X, 2 yp Ee & 
"46 (BC_6, DBUS(13), bidir, X, 2, go a) Gg & 
"47  (BC_6, DBUS(14), bidir, X, Zi pe Tea & 
"48 (BC_6, DBUS(15), bidir, X, 2; ge By Ge™ & 
"49 (BC_6, DP(2), bidir, X, 2 yp )ig™ & 
"50 (BC_6, DBUS (16), bidir, X, ZF bem (Zidig & 
"51 (BC_6, DBUS(17), bidir,- °X; 2, ee. Ban & 
"52 (BC_6, DBUS(18), bidir, X, Zi ha Ti La)ig™ & 
"53  (BC_6, DBUS(19), bidir, X, 2; BZ ge & 
"54 (BC_6, DBUS(20), bidir, X, 2 2 | yoga & 
"55> (BC6, DBUS (21), bidir, X, 2, ee (Ldig & 
"56 (BC_6, “DBUS (22), bidir, X, 2, «Be & 
"57  (BC_6, DBUS(23), bidir, X, 2 ys Big ™ & 
"58. {BC-6, “DPY3) > bidir, X, 25 fi. Layee & 
"59 (BC_6, DBUS(24), bidir, X, Zi , @)y," & 
"60 (BC_6, DBUS(25), bidir, X, 25 gp oS)" & 
"61 (BC_6, DBUS(26), bidir, X, 2, vee Bee & 
"62 (BC_6, DBUS(27), bidir, X, 24 po Bye & 
"63  (BC_6, DBUS(28), bidir, X, 2; nm VL)ya & 
"64 (BC_6, DBUS(29), bidir, X, Zi y.. dig & 
"65  (BC_6, DBUS(30), bidir, X, 2, mo “Beg™ & 
"66 (BC_6, DBUS(31), bidir, xX, Zi, eee’) ra & 
"67 (BE-1, STPCLK, input, X)," & 

"68 (BC_1, IGNNE, input: ,:X) >." & 

"69 (BC_1, INV, input, X)," & 

"710 (BC_2, CACHE, output3, X, 110, dy Z)," & 
"71  (BC_2, FERR, Sutpucsy x, 108, Ty —2yy" & 
"72  (BC_1, SMI , input, X)," & 

"73, (BC_1, WB_WT , input, X)," & 

"74 (BC_2, HIIM, oubput3;, KX; 109; Ll, B)+™ & 
"75 (BC_2, SMIACT , eubputs, x 209, Ly “29 & 
"76 (BC_1, SRESET , input, X)," & 

"77 (BC_1, NMI, input, X)," & 

"78  (BC_1, INTR, input, X)," & 

"79 (BC_1, FLUSH, input, X)," & 

"80 (BC_1, RESET, input, X)," & 

"81 (BC_1, A20M, Input, X) 5" & 

"82  (BC_1, EADS, input, X)," & 

"83-5 “(BE <2;. PED; output3, X, 0, dg Bye" & 
"84 (BC_2, PWT, output3, X, Oy cdg -2)5-" & 
#35 (BC_2, DC, output3, X, 0, 1, Ziy ge & 
"86 (BC_2, MIO, output3, X, OF dye 2) y™ & 
"87  (BC_2, BE(3), GUEpUES, =X, “110, Tx .2by* & 
"88 (BC_2, BE(2), outputs; x, 110, Ty, -2)y* & 
"89 (BC_2, BE(1), output3, X, Oe Cig -2)g & 
"90 (BC_2, BE(0), Sutpucsy. x, PLO, Ty <2yy" & 
"O71 (BC_2, BREQ, output3, X, 109, 1, Z)y™ & 
"92 (BC_2, WR, output3, X, 110, Le Ziyi” & 
"93  (BC_2, HLDA, output3, X, 109, we Bie & 
"94 (BC_1, CLK, anputy. Xie" & 
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"95 (BC_1, AHOLD, input, X), & 

"96 (BG: _1; HOLD, input, X)," & 

"97 (BC_1, KEN, input, X)," & 

"98 (BC_1, RDY, input, X)," & 

"99 (BC.1, -CLEMUL; input, X)," & 

"100 (BCL, BSS, input, X)," & 

"101 (BC_1, BS16, input, X)," & 

MLO2: (BES A; BOFF, input, X)," & 

"103: (BC.1, BRDY, input, X)," & 

"104 (BC_2, PCHK, output3, X, 109, 1, Z)," & 

WLOS (BCu2, LOCK, QUepues,. Xp LO, Tye yy & 

"106 (BC_2, PLOCK, output3, X, 110, 1, Z)," & 

"107 (BC_2, BLAST, output3, X, 110, 1, Z)," & 

"108 (BC_2, ADS, output3, X, 110, 1, Z)," & 

"109 (BC_2, Xe, cont rot;-- 1)" & -- DISMISC 
NI10). (BGA2;,. . 3 control, 1)," & =+ DISBUS 
M1411 (BEL2;. control, 1)," & —- DISABUS 
W112. (BE2, x control, 1)"; -- DISWR 


end WBE_INTELDX4; 
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APPENDIX G 
SYSTEM DESIGN NOTES 


G.1) SMM ENVIRONMENT INITIALIZATION 


When the Intel486™ processors are operating in Real Mode, the physical address at which in- 
structions and data are fetched is determined by the segment register and an offset (i.e., CS and 
IP for instructions). When a new value is loaded into a segment register, the new value is shifted 
to the left by four bits and stored in a segment base register that corresponds to that particular 
segment (CSBASE, DSBASE, ESBASE, etc.). It is the value stored in the segment base register 
that is used to generate a physical address. For example, the linear address to be used for fetching 
instructions is determined by adding the value contained in the CS segment base register with the 
value in the IP register. 


When the processor is in Protected Mode, the segment registers are used as selectors to a descrip- 
tor table. Each descriptor in a descriptor table contains information about the segment in use, in- 
cluding the segment’s BASE address (i.e., CSBASE) and limit (the size of the segment), as well 
as the protection level, privileges, operand sizes, and the segment type. In Protected Mode, the 
linear address is determined by adding the base portion of the descriptor to the appropriate offset. 


When in System Management Mode, the processor operates in a pseudo-Real Mode, with address 
calculation performed in the Real Mode manner. However, the processor adds the value in the 
segment base register with the value in the EJP register, rather than the IP register, so there are 
no limits as to the segment size. The physical address of an instruction is obtained by adding the 
value in CSBASE to the value in EIP. 


When entering SMM, it may be necessary to initialize the segment registers to point to SMRAM 
(see Section 8.4.2, “Processor Environment” for their value on SMM entry). If SMBASE has not 
been relocated, then the necessary segment registers can be initialized to point to SMRAM by us- 
ing the value in the CS register, 3000H, which points to the SMRAM address space. 


When an SMI# occurs after SMBASE has been modified, CSBASE is loaded with the new value 
of SMBASE. However, the CS selector register still contains the value 3000H, not the value cor- 
responding to the new SMBASE. 


To initialize segment registers to point to the new SMRAM area, read the SMBASE value from 
the SMM state that was saved in memory. Because the data segment registers are initialized to 0, 
do not use them to access the SMM state save area. Instead, perform a read relative to the CS reg- 
ister by using a CS override prefix to a normal memory read. Although CS still contains 3000H, 
CSBASE contains the value of SMBASE, and CSBASE is used for the address generation. 


Once the value of SMBASE is obtained, it must be shifted to the right by four bits to get the ap- 
propriate value to be placed in the segment registers. The CS register itself can be initialized by 
executing a far jump instruction to an address within SMBASE, which causes CS to be reloaded 
with a value corresponding to SMBASE. 


Example E-1 describes one method of initializing the segment registers when SMBASE has been 
relocated. This method works if SMBASE is less than | Mbyte. 
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Example G-1. Initialization of Segment Registers within SMM 


;read the value of SMBASE from the state save area 
movsi,FEF8H;SMBASE slot in SMM state save area 
moveax,cs:[si];copy SMBASE from SMBASE:FEF8H to eax 


7scale the SMBASE value to a 16-bit quantity 
movcl,4 
roreax,cl ;scaled value of SMBASE now in ax 


;to load cs, execute a far jump to an address that has been stored 
jat memory location PTR_ADDR 


;store the SMBASE value and an offset to a memory location that can be used as 
yan indirect jump address 


movdi,PTR_ADDR;PTR_ADDR is the location used to 
; store the jump address 

movbx, OFFSET;OFFSET is the address where 
7execution continues after the 
;far jump 


movcs: [di],bx;store the offset for the far jump 

incdi 

incdi 

movcs: [di],ax;store the segment address for the 
;far jump, which is SMBASE 

movbx,PTR_ADDR;bx now contains the address of the 
; location holding the jump address 


;initialize DS and ES with the correct address of SMBASE 
movds, ax 
moves, ax 


;execute a far jump instruction to load the CS register 
jmpfar [bx]; jump to address stored at memory 
; location pointed to by bx 


;CS now contains the correct value of SMBASE, and execution continues from the 
;address SMBASE:OFFSET 
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G.2. ACCESSING SMRAM 


G.2.1_ Loading SMRAM with an Initial SMI Handler 


Under normal conditions, the SMRAM address space should be accessible by the processor only 
while it is in SMM mode. However, some provision must be made for invoking the initial SMM 
interrupt handler routine. 


Because System Management Mode must be transparent to all operating systems, the initial 
SMM handler must be loaded by the system BIOS. At some time during the power on sequence, 
the system BIOS must move the SMM handler routine from the BIOS ROM to the SMRAM. The 
system designer must provide a hardware mechanism that allows access to SMRAM while SMI- 
ACT# from the processor is inactive. One method is to provide an I/O port in the memory con- 
troller that forces memory cycles at a given address to be relocated to the SMRAM. Once the 
initial SMM handler has been loaded to SMRAM, the I/O port is disabled to protect against ac- 
cidental accesses to SMRAM. 


The system BIOS must provide an SMM handler at the address 38000H. If the system design 
takes advantage of the SMRAM relocation feature of the processor, this handler must change the 
SMBASE register in the SMM state save. Next, the BIOS must move the full featured SMM han- 
dler to the new address. An SMI# must be generated to change the SMBASE register before the 
BIOS passes control to the operating system. 


G.2.2 SMRAM Hidden from DMA and Bus Masters 


In a system that allows DMA or other devices to take control of the system bus, care must be tak- 
en to ensure that only the master processor can access SMRAM. When an external bus master 
requests use of the system bus (by asserting HOLD or BOFF#) while the processor is executing 
an SMM handler routine, the processor responds by passing control of the bus to the requesting 
device. The system memory controller must redirect any memory accesses that are not generated 
by the processor to normal system memory as if SMIACT# were inactive. 


DMA accesses to the SMRAM area must be redirected to the correct address space when the ini- 
tialization routine is loading SMRAM, as well as when the processor is in SMM. 


It is not recommended to block bus control requests when in SMM, because the increased bus 
access latency could cause compatibility issues with some software or expansion hardware. 


G.2.3. Accessing System Memory from within SMM 


In order to enter a suspend state in which power is removed from some or all of system memory, 
it is necessary for the processor to have access to the entire system address space from within 
SMM. Access to system memory from within SMM requires that the memory controller decode 
both SMIACT# and the processor address to determine accesses to SMRAM. Only those memory 
addresses that are defined as being SMRAM space are directed to SMRAM. If SMRAM is locat- 
ed at an address that overlays normal system memory address space (see Section 8.6.1, “SMRAM 
Interface’), the processor must have a method of accessing both SMRAM (for code reads) and 
system memory simultaneously. 
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Figure G-1. Blocking Other Bus Masters from Accessing SMRAM 


G.2.3.1 Hardware Method 


Ideally, a method of accessing system memory that is mapped underneath SMRAM is provided 
by the system memory controller. The memory controller would provide a register that allows 
system memory at a given address to be remapped to a different address, which is not overlaid by 
SMRAM. When the SMM handler implements a suspend, it would first move all of system mem- 
ory that is not underneath SMRAM to a non-volatile medium (such as a hard disk drive). Next, 
the SMRAM image would be transferred to the non-volatile medium. Finally, the memory under- 
neath SMRAM would be accessed and copied to the non-volatile medium with a processor read 
to the remap address space, which is redirected to the overlaid system memory (see Figure G-2). 


If the memory controller does not provide a method of accessing overlaid system memory, it is 
possible to implement a software procedure to accomplish the same goal. However, the software 
method is complex; a hardware method is preferred. A description of the software method fol- 
lows. 


G.2.3.2 Software Method 


The ability to access the system memory that is located in the address space under SMRAM re- 
quires a method of resuming from SMM to a predetermined address space. This can be accom- 
plished with the following procedure. 


When resuming from SMM, the processor continues execution at the address contained in the CS 
and EIP slots within the SMM state save. However, the resume address cannot be changed by 
simply modifying the CS and EIP slots, because the processor uses the CS descriptor to determine 
the actual resume address. The descriptor registers are stored in reserved slots in the SMM state 
save, and they cannot be directly modified. 
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By replacing the suspend state save with a previously obtained image of a state save that returns 
to a known location, the SMM suspend handler can force a return to a given address, as explained 
in the following sequence: 


ls 


During initial system power up, execute an SMI# from a predetermined address (the 
address immediately preceding the address to which you later wish to resume). This can 
be accomplished by generating an SMI# in response to an I/O instruction or executing a 
halt instruction and using an SMI# to exit the halt state. 


Save the state save from this SMM to a safe location (SMRAM). 


When the system must resume to a given address from another SMI#, the stored state save 
can be substituted for the state save generated from that particular SMM. 


Normal 
Memory 


Remap Address for 


Remap Feature Overlaid Region 


CPU Accesses Disabled 
to Remap <a A 
Address Space Remap Feature B << SM MAN 
Enabled — > ; 
Overlaid Region A 


Normal 
Memory 


242202-167 


Figure G-2. Remapping Memory that is Overlaid by SMRAM 


Now that SMM can be resumed at a predetermined address, access the entire system memory 
space from within SMM before executing a suspend: 


1. 


During a suspend SMM, save all system memory except that which is located underneath 
SMRAM to a specified (and reserved) section of the hard disk. The ability to access 
system memory requires the memory controller to decode both SMIACT# and the 
processor address, and direct a limited section (64 or 128 Kbytes) of the processor address 
space to SMRAM. All other processor memory accesses should go to normal system 
memory. 


Save the contents of the SMM state save to the hard disk. 
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Modify the SMM state save so the RSM instruction returns to a predefined address, that is 
not in the application that was interrupted. The code at this address must contain the 
remainder of the suspend SMM handler. The predefined address can be anywhere in the 
processor address space, because the contents of system memory have already been saved 
to disk. 


Execute an RSM instruction, which exits SMM and returns control to a predetermined 
address (that must contain the rest of the SMM suspend handler). 


Save the rest of system memory (that which is located underneath SMRAM) to the hard 
disk. This address space can now be accessed with normal move instructions, because we 
are no longer in SMM. 


Save a flag (in CMOS memory) to indicate that the next reset should cause a resume from 
suspend. 


Power-down the memory (and possibly the processor). 


When power is restored, the processor is reset and begins execution of the power on self- 
test (POST) in BIOS. Early in the POST, the system should check the status of the 
suspend flag. 


Load a preliminary SMM handler to location 38000H and generate an SMI#. The SMM 
handler should read the SMBASE slot from the SMM state save that was stored to hard 
disk. SMBASE is then modified to point to the final SMRAM location and the system 
resumes from SMM to the system BIOS. 


Restore the contents of system memory located underneath SMRAM from the hard disk. 


Generate a second SMI#, which executes an SMM handler at the original value of 
SMBASE (before the suspend SMM). The SMM handler restores the contents of the rest 
of system memory from the hard disk, and then restores the original SMM state save to the 
SMM state save area in SMRAM, discarding the most recent SMM state save. 


Execute an RSM instruction, which returns execution to the application that was 
interrupted by the suspend request. 


INTERRUPTS AND EXCEPTIONS DURING SMM HANDLER ROUTINES 


To ensure transparency to existing system software, the SMM handler should not depend on in- 
terrupt or exception handlers provided by the operating system. However, in some cases it may 
be necessary to service interrupts or exceptions while in System Management Mode. In these cas- 
es, SMM-compliant interrupt and exception handlers, as well as an SMM-compliant interrupt 
vector table, should be provided. 
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G.3.1. SMM-Compliant Vector Tables 


An SMI# interrupt request can be generated while code is running under any of the other three 
processor operating modes (Real, Virtual-86, or Protected). When entering the SMM handler, the 
processor enters a pseudo-real mode, and the beginning of the interrupt vector table must be lo- 
cated at the address OOOOOOOOH. Before allowing any interrupts or exceptions to occur, the SMM 
handler routine must provide a valid interrupt vector table. Any code that is executed before set- 
ting up an SMM-compliant interrupt vector table must be written carefully to ensure that no ex- 
ceptions are generated. 


The system memory controller could relocate accesses to the SMM interrupt vector table to a lo- 
cation within SMRAM. In this case, when SMIACT# is active, all accesses to the lowest 1 Kbyte 
of the processor address space would be redirected to SMRAM, which would contain an 
SMM-compliant vector table that has already been initialized. 


If the system memory controller does not redirect interrupt vector table reads to an address within 
SMRAM, there are three steps required to provide an SMM-compliant interrupt vector table: 


1. Save the contents of memory to SMRAM at address OOOOO000H. 


2. Provide vectors for any possible interrupts or exceptions at the appropriate location in the 
vector table. 


3. Restore the original memory contents from SMRAM before exiting the SMM handler 
routine. 


G.3.2 Interrupts and Subroutines with SMRAM Relocation 


There is an additional issue that must be considered if interrupts or exceptions are to be executed 
within SMM and SMRAM has been relocated. Interrupt or subroutine calls from within SMM 
operate in a manner similar to Real Mode. When a subroutine is called or an interrupt is recog- 
nized, the 16-bit CS and IP registers are pushed onto the stack to provide a return address. 


When SMRAM is relocated to an address space above 1 Mbyte and an interrupt or subroutine call 
occurs, only 16 bits of the EIP register are pushed onto the stack. When returning from the sub- 
routine or interrupt, the processor vectors to a location where the upper 16 bits of EIP are zero. 
This can be avoided for subroutines by using an address size override before calling the subrou- 
tine. However, the issue remains for interrupts. 
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G.4__IntelDX2™ AND IntelDX4™ PROCESSOR FLOATING-POINT OPERATION 
AND SMM 


G.4.1. The Need to Save the FPU Environment 


When the processor enters System Management Mode, the context information for the interrupt- 
ed application is automatically saved to a specific state save address. When the SMM handler re- 
turns control to the interrupted application by executing the RSM instruction, the context 
information from the interrupted application is restored to the processor by reading from the state 
save location. This mechanism allows the SMM handler routine to modify most of the processor 
registers without the need to explicitly save them to memory. However, the registers in the pro- 
cessor’s Floating-Point Unit (FPU) are not automatically saved when the processor enters SMM. 
If the SMM handler must modify any of the registers in the FPU, or if the register data will be 
lost due to entering a power down state, the SMM handler must first explicitly save the FPU state 
as it existed in the interrupted application. 


There are two instances in which an SMM handler routine must be aware of the Floating-Point 
Unit (FPU): 


1. When removing power from the processor / FPU for the purpose of executing a suspend 
sequence. 


2. When the SMM handler uses FPU instructions. 


In both of these cases, the SMM handler must save the state of the FPU as it was left by the in- 
terrupted application. 


The information stored by the FPU state save instructions (FSAVE, FNSAVE, FSTENV, and 
FNSTENV) is dependent on the operating mode of the processor. The FPU state save instructions 
store the FPU state information in one of four formats: 16-bit Real Mode, 32-bit Real Mode, 
16-bit Protected Mode, or 32-bit Protected Mode, depending on the processor operating mode. 
The content of the information saved also varies slightly, depending on the processor operating 
mode in which the save instruction was executed. For example, the 32-bit Protected Mode FN- 
SAVE instruction saves the address of the last executed FPU instruction and its operands in the 
form of a segment selector and a 32-bit offset. In contrast, the 16-bit Real Mode FNSAVE in- 
struction saves the address information in the form of a 20-bit physical address. Because the for- 
mat in which the FPU state restore instructions (FRSTOR and FLDENV) recall the information 
also depends on the operating mode of the processor, the save and restore instructions must be 
executed from the same processor operating mode. 
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G.4.2 Saving the State of the Floating-Point Unit 


When an SMM handler routine must save the state of the Floating-Point Unit, it must save all 
FPU state information necessary for the interrupted application to continue processing. This state 
information includes the contents of the Floating-Point Unit stack, which requires use of the FN- 
SAVE or FSAVE instruction (FSTENV does not save the contents of the FPU stack). If the last 
executed non-control floating-point instruction caused an error (such as a divide by 0), the saved 
information must also include the address of the failing instruction and the addresses of any op- 
erands for that instruction. Without these addresses, it would be impossible for the FPU exception 
handler of the interrupted application to correct the error and restart the instruction. 


The FNSAVE and FSAVE instructions differ in that FNSAVE does not wait for the FPU to check 
for an existing error condition before storing the FPU environment information. If an unmasked 
FPU exception condition is pending, execution of the FSAVE instruction forces the processor to 
wait until the error condition is cleared by the software exception handler. Because the processor 
is in System Management Mode, the appropriate exception handler is not available, and the FPU 
error is not corrected in the manner expected by the interrupted application program. For this rea- 
son, the FNSAVE instruction should be used when saving the environment of the FPU within 
SMM. 


Because the SMM handler does not know the processor mode in which the interrupted application 
was executing (16- or 32-bit, Real or Protected), the SMM handler must execute the FNSAVE 
instruction in a mode in which all FPU state information is stored. The 32-bit Protected Mode 
format of the FNSAVE instruction is a superset of all other formats of the FNSAVE instruction. 
Therefore, executing the 32-bit Protected Mode FNSAVE instruction ensures that all FPU state 
information is saved. 


Executing the FNSAVE instruction in 32-bit Protected Mode requires that the processor be tem- 
porarily placed in Protected Mode. Rather than perform all of the setup details and overhead nec- 
essary to place the processor into Protected Mode, including the initialization of all descriptors 
and descriptor tables, it is possible to temporarily place the processor into Protected Mode for the 
purpose of executing only a few carefully written instructions. This can be accomplished by set- 
ting the PE bit in the CRO register, and then executing a short jump to clear the instruction pipe- 
lines. 


It is important to note that any instruction that modifies a segment register will cause the proces- 
sor to attempt to load a new descriptor from the descriptor table. (The occurrence of an interrupt 
or an exception would cause the processor to load a new descriptor, so interrupts must be disabled 
during this sequence.) Because neither the descriptors nor the descriptor table have been initial- 
ized, this would cause the system to crash. Therefore, all segment registers that are to be used in 
the FPU state save instructions must be initialized before entering Protected Mode. 


Example G-2 is an example of code that can be used to place the processor in Protected Mode 
and save the FPU state. 


Note that the no wait form (FNSAVE) of the save instruction must be used. In the event that the 
previous FPU instruction caused a Floating-Point error, we do not want to wait for this error to 
be serviced before executing the save instruction. Additionally, if the FSAVE instruction were 
used, the operand size override prefix would be incorrectly applied to the implicit WAIT instruc- 
tion that precedes FSAVE, rather than to the save instruction itself. 
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Before exiting the SMM handler and returning to the interrupted application, the register contents 
of the Floating-Point Unit must be returned to their previous values. This can be accomplished 
by executing the 32-bit Protected Mode format of the FRSTOR instruction. Example G-3 gives 
an example code segment that can be used to restore the FPU to the state in which it was inter- 
rupted by the SMI request. 


Note that the no wait form (FNRSTOR) of the restore instruction must be used. If the FRSTOR 
instruction were used, the operand size override prefix would be incorrectly applied to the implic- 
it WAIT instruction that precedes FRSTOR, rather than to the save instruction itself. 
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G.5 SUPPORT FOR POWER-MANAGED PERIPHERALS 


G.5.1 Shadow Registers 


Before power is removed from any device, the state of that device must be saved in a protected 
memory space so that the device can be reinitialized to its previous state. If a peripheral contains 
a write-only register, the value in that register can be recovered by providing shadow registers 
that are both readable and writeable. 


These shadow registers should be updated every time the peripheral registers are written, but they 
have no function other than tracking the data written to a particular register. 


Example G-2. Saving the FPU State in 32-Bit Protected Mode 


;first initialize the registers used to store the state save information 


movdx, SEGMENT ; SEGMENT is the segment to be used by 
;the save instruction, 

movds, dx ; normally it should point to SMRAM 

movsi, OFFSET ;OFFSET is the offset used in the save 


; instruction 


;set the PE bit in CRO 
moveax, cr0 ;read the old value of CRO 
or eax,00000001H ;set the PE bit 
movcr0O, eax 


;enter protected mode by executing a short jump to clear the prefetch queue 
jmpprotect 


protect: 


;we can now save the state of the FPU in the protected mode format 


db 66H ;use an operand size override prefix 
;to set 32-bit format 
fnsave[si] ;FPU state saved to SEGMENT:OFFSET 


;now return to real mode to continue with the SMM handler (no jump is 
; required) 


moveax, cr0 ;clear the PE bit in CRO 
andeax, OFFFFFFFEH 
movcr0,eax 


In addition to the write only registers in a system, there are several other registers that must be 
shadowed. Any device that requires registers to be programmed in a particular sequence must 
also have its registers shadowed. Examples in a typical personal computer include the program- 
mable interrupt controller, the DMA controller, and the programmable timer/counter. 
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It is also possible to perform shadowing of some write only registers using SMM. Any time a 
write cycle is generated to a write only register, the system can generate an SMI#. The SMM han- 
dler can use the processor state information saved in the SMM state save to save the data from 
the interrupted I/O cycle to a predetermined location in the SMRAM space. 


Example G-3. Restoring the FPU State from a 32-Bit Protected Mode Save 


;first initialize the registers used to recall the state save information 


movdx, SEGMENT ; SEGMENT is the segment to be used by 
;the restore instruction, 

movds, dx ;normally it should point to SMRAM 

movsi, OFFSET ;OFFSET is the offset used in the 


;restore instruction 
;set the PE bit in CRO 


moveax,cr0O ;read the old value of CRO 
or eax,00000001H;set the PE bit 
movcr0O, eax 


;enter protected mode by executing a short jump to clear the prefetch queue 


jmpprotect 
protect: 


;we can now recall the state of the FPU from the previous FNSAVE instruction 
7 (in the protected mode format) 


db 66H ;use an operand size override prefix 
;to set 32-bit format 
fnrstor[si] ;FPU state restored from 


7; SEGMENT : OFFSET 


;now return to real mode to continue with the SMM handler (no jump is 
; required) 


moveax,cr0;clear the PE bit in CRO 
andeax, FFFFFFFEH 


mov cr0,eax 


The information contained in the SMM state save can be used (with the knowledge that the SMI# 
was in response to an I/O write instruction) to determine both the address and the data of the in- 
terrupted write instruction. The SMM handler can examine the OPCODEs of previous instruc- 
tions by decrementing the IP (or EIP) register. Once the correct OPCODE is determined, it can 
be used with the values in the EAX and DX slots of the SMM state save to update the information 
in the memory used to shadow the I/O register. I/O write instructions occur in one of three forms: 
1) a write to an address that is specified in the OPCODE; 2) a write to an address contained in the 
DX register; or 3) a string write to an address contained in the DX register. 
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The I/O write instructions have the following OPCODEs: 


Table G-1. TI/O Write Instruction OPCODEs 


Instruction OPCODE Notes 
OUT x, al E6x x is the address of the I/O port 
OUT x,ax E7x x is the address of the I/O port 
OUT x,eax E7x x is the address of the I/O port 
OUT dx,al EE 
OUT dx,ax EF 
OUT dx,eax EF 
OUTSB 6E 
OUTSW 6F 
OUTSD 6F 


The SMM handler must know whether a particular I/O port is 16 or 32 bits in order to distinguish 
between 16 and 32 bit I/O write cycles. 


The SMM handler can decrement the value of IP contained in the state save, and then examine 
the memory contents at that address. If the SMM handler knows that the last instruction was an 
I/O write instruction, and writes to I/O addresses 6EH, 6FH, OEEH, and OEFH will not cause an 
SMI#, it can use the SMM state save data for EAX and EDX to reconstruct the last instruction. 


G.5.2 Handling Interrupted I/O Write Sequences 


In a typical personal computer, there are several hardware devices that require the control regis- 
ters for that device to be programmed in a particular order. For example, the interrupt controller, 
the DMA controller, the programmable timer/counter, the keyboard controller, and the real time 
clock all require a series of accesses to properly initialize the registers in that particular device. 
Some of these devices may require successive accesses to registers located at different addresses, 
while others may require several control registers to be programmed through write cycles to the 
same address. 


If an SMI request interrupts an application that is in the process of initializing the registers in one 
of these devices, special care must be taken to ensure that the peripheral is returned to its original 
state when control is returned to the interrupted program. For some SMM handler events, it may 
be necessary to power down the device or change the state of a register within the device. In these 
cases, the SMM handler must return control to the interrupted application in such a way that the 
application can continue with the correct sequential access in the interrupted sequence. 


To accomplish this, the SMM handler must restore the original values of all registers in the de- 
vice, and restart the interrupted sequence so that the application may continue where it left off. 
This requires system hardware to shadow all registers that need to be accessed in the sequence, 
keep an index indicating which position in the sequence the register occupies, and keep a pointer 
so that SMM software knows to which register the last access was directed. This pointer would 
indicate the last register of each sequence that was programmed in the particular peripheral. 
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For example, programming the master interrupt controller requires a write to I/O port 20H 
(ICW 1) followed by four write cycles to I/O port 21H (ICW2, ICW3, IC W4, and OCW1). If this 
sequence is interrupted by an SMI request, and the resulting SMM handler either modifies or 
powers down the interrupt controller, the SMM handler must return control to the interrupted ap- 
plication such that the following access to the interrupt controller would access the correct regis- 
ter in the sequence. System hardware must save the contents of each of the registers, as well as a 
pointer indicating which register was last written. 


Before returning control to the interrupted application, the SMM handler must initialize ICW1-— 
ICW4 and OCW1 to their previous values. It would then re-write the appropriate registers so that 
the first access by the application program would be to the location in the sequence following the 
last location it programmed before it was interrupted by the SMI request. 


A similar procedure must be followed for each of the peripherals that require control registers to 
be initialized in a particular order. 
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